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OM nucleic - nucleic search, using sw model 
Run on: February 26, 2004, 00:47:23 



; Search time 4508.21 Seconds 
(without alignments ) 
17679.337 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 



US-09-989-981A-7 
2669 

1 gtgtccctgctccaggaaac. 



. caattaaaaatgtattgagc 2669 



Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 

Searched: 27513289 seqs, 14931090276 residues 

Total number of hits satisfying chosen parameters: 



55026578 



Minimum DB seq length: 
Maximum DB seq length: 



0 

2000000000 



Post-processing : 



Minimum Match 0% 
Maximum Match 100% 
Listing first 45 summaries 



Database 



EST:* 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 



em__estba : * 
em_esthum: * 
em_estin : * 
em_estmu: * 
em_estov: * 
em_estpl : * 
em__estro : * 
em_htc: * 
gb_estl ; * 
gb_est2:* 
gb_htc : * 
gb_est3 : * 
gb_est4 : * 
gb_est5:* 
em_estf un ; * 
em_estom: * 
em_gss__hum: * 
em_gs s_inv : * 
em_gss__pln : * 
em_gss_vrt : * 
em_gss_fun : * 
em__gss_mam: * 
em_gss_mus : * 
em_gss_pro : * 
em__gss_rod: * 
em_gss_phg : * 
em_gss_vrl : * 



28: gb_gssl:* 
29: gb_gss2:* 

Pred. No, is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 





No . 


Score 


Match Length 


DB 


ID 


Description 




1 


1511 . 6 


56 


. 6 


3623 


11 


AK004871 


AK004 871 Mus muscu 




2 


1286 2 


48 


.2 


2417 


11 


AK050938 


AK050938 Mus muscu 




3 


681 . 8 


25 


.5 


691 


13 


BX481838 


BX481838 DKFZp686M 




4 


460 , 6 


17 


.3 


849 


12 


BI330745 


BI 33 074 5 6029824 09 




5 


370 . 4 


13 


. 9 


549 


10 


BF660076 


BF660076 maa27c08. 




6 


361 . 4 


13 


.5 


583 


13 


BY705076 


BY705076 BY705076 




7 


355 . 8 


13 


.3 


457 


14 


T91380 


T91380 yd53b02.sl 




3 


332 . 4 


12 


.5 


334 


13 


BX482362 


BX482362 DKFZp686F 




9 


331 . 2 


12 


.4 


510 


10 


BB610072 


BB610072 BB610072 




10 


323 


12 


, 1 


511 


9 


AI157365 


AI157365 ui45h01.y 




11 


318 . 4 


11 


.9 


564 


14 


T84531 


T84531 yd53b02.rl 




12 


309 . 8 


11 


.6 


500 


9 


AI 1 5 1 8 1 1 


AI151811 ui46cl0.y 




13 


276,8 


10 


.4 


463 


9 


AA537862 


AA537862 vj35a03.r 




14 


276 


10 


.3 


781 


14 


CB502603 


CB502603 ssalmge50 




15 


263 , 2 


9 


.9 


640 


14 


CD739823 


CD739823 4028769 1 




16 


236 


8 


. 8 


613 


14 


CF367733 


CF367733 852301 MA 




17 


234 . 6 


8 


.8 


398 


9 


AI597406 


AI597406 vj35a03.y 




18 


226 . 6 


8 


.5 


586 


11 


AK008188 


AK008188 Mus muscu 




19 


226 . 4 


8 


.5 


581 


13 


BY708144 


BY708144 BY708144 




20 


202 . 8 


7 


.6 


435 


13 


BX099922 


BX099922 BX099922 




21 


198 . 4 


7 


.4 


435 


9 


AI574075 


AI574075 uj67hll.y 




22 


198 


7 


. 4 


762 


29 


CC659228 


CC659228 OGUFF57TV 




23 


196, 4 


7 


.4 


821 


28 


BZ650554 


BZ650554 OGCBA89TC 




24 


194 


7 


.3 


916 


29 


CG323718 


CG323718 OG0DQ45TH 


c 


25 


193 . 2 


7 


.2 


709 


29 


CC695831 


CC695831 OGUL023TV 




26 


187 .2 


7 


.0 


776 


29 


CG327545 


CG327545 OGWFJ96TV 


c 


27 


183. 8 


6 


. 9 


891 


29 


CG368338 


CG368338 OG3BP65TV 




28 


183.2 


6 


. 9 


826 


29 


CG214497 


CG214497 OG1BM08TV 




29 


180.4 


6 


. 8 


578 


14 


CF366327 


CF366327 840972 MA 




30 


171 


6 


.4 


849 


29 


CG270361 


CG270361 OGWFS70TH 




31 


171 


6 


. 4 


857 


29 


CG271003 


CG271003 OG0EJ71TV 


c 


32 


166.8 


6 


.2 


861 


29 


CG262933 


CG262933 OG1DH53TV 




33 


162.4 


6 


. 1 


345 


14 


CD730599 


CD730599 4038931 1 




34 


155. 6 


5 


. 8 


839 


29 


CG262656 


CG262656 OG1AN46TH 


c 


35 


151 


5 


.7 


912 


29 


CC604602 


CC604602 OGUFQ75TH 




36 


150 


5 


.6 


909 


29 


CG268466 


CG268466 OG2BT15TH 




37 


149,2 


5 


.6 


523 


9 


AU195806 


AU195806 AU195806 




38 


145.4 


5 


.4 


447 


12 


BI145065 


BI145065 602909138 




39 


144.2 


5 


.4 


936 


10 


BF162656 


BF162656 601769307 




40 


143.2 


5 


.4 


694 


13 


CA140253 


CA140253 SCEZRT202 




41 


141.6 


5 


.3 


823 


14 


CB649273 


CB649273 OSJNEbl3B 




42 


141 


5 


,3 


833 


10 


BF620684 


BF620684 HVSMEc002 




43 


138.8 


5 


.2 


566 


9 


AU192726 


AU192726 AU192726 




44 


138 


5 


.2 


560 


9 


AU192604 


AU192604 AU192604 




45 


137.8 


5 


.2 


754 


14 


CB627408 


CB627408 OSIIEb02F 



ALIGNMENTS 



RESULT 1 
AK004871 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

0RG7\NISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 

TITLE 
JOURN/y^ 
REFERENCE 
AUTHORS 

TITLE 

JOURNAL 



AK004871 3623 bp mRNA linear HTC 20-SEP-2003 

Mus musculus adult male liver cDNA, RIKEN full-length enriched 
library, clone : 1300003C16 product :ATP-BINDING CASSETTE, SUB-FAMILY 
G, MEMBER 8 (STEROLIN-2) homolog [Mus musculus], full insert 
sequence . 
AK004871 

AK004871.1 GI: 12836380 
HTC; CAP trapper. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 

Carninci,P, and Hayashizaki , Y. 

High-efficiency full-length cDNA cloning 

Meth. Enzymol. 303, 19-44 (1999) 

99279253 

10349636 

2 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 

Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 

prepare full-length cDNA libraries for rapid discovery of new genes 

Genome Res. 10 (10), 1617-1630 (2000) 

20499374 

11042159 

3 

Shibata,K., Itoh,M. , Aizawa,K., Nagaoka,S., Sasaki, N., Carninci,P., 
Konno,H., Akiyama,J., Nishi,K., Kitsunai,T., Tashiro,H., Itoh,M., 
Sumi,N., Ishii,Y., Nakamura,S., Hazama,M., Nishine,T., Harada,A. , 
Yamamoto,R., Matsumoto, H . , Sakaguchi, S • , Ikegami,T., Kashiwagi, K. , 
Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M. , Ohara,E., Watahiki,M., 
Yoneda^Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura,S., Kawai,J,, 
Okazaki,Y., Muramatsu,M. , Inoue,Y., Kira,A. and Hayashizaki, Y. 
RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer 
Genome Res. 10 (11), 1757-1771 (2000) 
20530913 
11076861 
4 

The RIKEN Genome Exploration Research Group Phase II Team and the 
FANTOM Consortium. 

Functional annotation of a full-length mouse cDNA collection 

Nature 409, 685-690 (2001) 

5 

The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

Analysis of the mouse transcriptome based on functional annotation 
of 60,770 full-length cDNAs 
Nature 420, 563-573 (2002) 



REFERENCE 6 (bases 1 to 3623) 

AUTHORS Adachi.J., Aizawa,K., Akahira^S., Akimura^T., Arai,A. ^ Aono,H., 
Arakawa,T., Bono,H., Carninci^P., Fukuda^S., Fukunishi, Y. , 
Furuno,M. , Hanagaki,T., Hara^A, , Hayatsu^N., Hiramoto, K. , 
Hiraoka,T., Hori,F., Imotani,K., Ishii,Y.^ Itoh^M., Izawa,M. , 
Kasukawa,T., Kato,H., Kawai,J., Kojima,Y., Konno,H., Kouda^M., 
Koya^S., Kurihara,C., Matsuyama, T . , Miyazaki,A, , Nishi,K., 
Nomura, K., Numazaki,R., Ohno^M,, Okazaki,Y., Okido,T., Owa^C, 
Saito,H., Saito^R., Sakai^C, Sakai^K., Sano,H., Sasaki, D., 
Shibata,K., Shibata,Y., Shinagawa,A. , Shiraki,T., Sogabe^Y., 
Suzuki, H., Tagami,M,, Tagawa,A, , Takahashi, F. , Tanaka,T., 
Tejima,Y., Toya,T., Yamamura,T., Yasunishi, A. , Yoshida,K., 
Yoshino,M., Muramatsu,M. and Hayashizaki, Y . 
TITLE Direct Submission 

JOURNTVL Submitted (lO-JUL-2000) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN) , Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan ( E-mail : genome-res@gsc . riken . go . jp, 
URL: http : //genome , gsc . riken . go . jp/ , Tel : 81-45-503-9222, 
Fax:81-45-503-9216) 

COMMENT Please visit our web site (http://genome.gsc.riken.go.jp/) for 

further details. 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. First strand cDNA was primed with a primer 
[5' GAGAGAGAGAGCGGCCGCAACTCGAGTTTTTTTTTTTTTTTTVN 3*], cDNA was 
prepared by using trehalose thermo-activated reverse transcriptase 
and subsequently enriched for full-length by cap-trapper. Second 
strand cDNA was prepared with the primer adapter of sequence [5' 
GAGAGAGAGAAGGATCCAAGAGCTCAATTAATTTAATTAAACCCCCCCCCCC 3 ' ] . cDNA was 
cleaved with Xhol and Sstl. Cloning sites, 5' end: SstI; 3' end: 
Xhol. Host: SOLR. 
FEATURES Location/ Qualifiers 

source 1. .3623 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6J" 

/db_xref="FANTOM_DB: 1300003C16" 

/db_xref="MGI: 1896857" 

/ db__xr e f = " t axon : 1 0 0 9 0 " 

/clone="1300003C16" 

/ sex="male" 

/ tissue_type="liver" 

/clone__lib="RIKEN full-length enriched mouse cDNA library" 
/dev_stage-"adult" 
CDS 69. .2090 

/note="unnamed protein product; ATP-BINDING CASSETTE, 
SUB-FAMILY G, MEMBER 8 {STEROLIN-2) homolog [Mus musculus] 
(SWISSPROTIQ9DBM0, evidence: FASTY, 92%ID, 96.7%length, 
match=1796) 
putative" 
/codon_start=l 
/protein_id-"BAB23630. 1" 
/db xref="GI: 12836381" 



polyA_signal 
polyA_site 
ORIGIN 



/translation="MAEKTKEETQLWNGTVLQDASQGLQDSLFSSESDNSLYFTYSGQ 

SNTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQML 

AIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLP 

NLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQC7\NTRVGNTYVRGVSGGE 

RRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSD 

IFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKE 

REVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVEL 

PGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHG7VKQLSFMDTAA 

LLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYV 

IIYAMPIYWLTNLRPVPELFLLHFLLWLWFCCRTMALAASAMLPTFHMSSFFCNAL 

YNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSI 

LGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKSIQDW" 

3605. .3610 

/note= "putative" 

3623 

/note-"putative" 



Query Match 56.6%; 
Best Local Similarity 77.0%; 
Matches 1965; Conservative 



Score 1511.6; 
Pred. No. 0; 
0; Mismatches 



DB 11; 

534; Indels 



Length 3623; 

53; Gaps 



8; 



Qy 

Db 



99 CATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATAC 158 

I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 

68 CATGGCTGAGAAAACCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTTCAGGATGC 127 



Qy 

Db 

Qy 

Db 



159 CTC GGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCAC 215 

II I I I I II I I I I II II I I I I I M M I I II I I I I I II I I I I I I I M I I I I I I I 

128 TTCGCAGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCAC 187 

216 CTACAGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGC 275 

I I I I I I I I I III I I II I I I I I I I II II I I I I I I INI I I I I I I I I M I I I I II 

188 CTACAGTGGTCAGTCCAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACATCGC 247 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



276 CTCTCAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAG 335 

I II I I I I I I M I M M I I I I I I I II I I I I I M I I I I I I I II M I I I I I I I I I II 

248 CTCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAG 307 

336 CTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCA 395 

I I I II I II I I M I I M I I I I I I II II I I I I I I I I I M I I I I I I I I I I I M 

308 CAGCCAAGACTCCTGTGAGCTGGGCATCCGAAATCTAAGCTTCAAAGTGAGGAGTGGACA 367 

396 GATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCAC 455 

M I I I I I II I I II I I I I II I I I I I I I I II I I I I I I I I I I I I II II I I I II I I I 

368 GATGCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCAC 427 

456 TGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAG 515 

III I I I I I I I I I I I I I I I II II I I I I I II II I I I II I I I I I I I I I I I I I 

428 AGGCAGAGGCCACGGTGGCAAGATGAAATCAGGACAAATTTGGATATyVTGGGCAACCCAG 487 

516 CTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACTVACCAGCTGCTCCC 575 

I I I I I I I I I II I I I I I I I I I I I II II II II II I Mill I I I I I I II I II 

4 88 TACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCC 547 

576 CAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTT 635 

II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

548 CAACCTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTT 607 



Qy 636 CTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCA 695 

I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 608 CTCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCA 667 

Qy 696 GTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAG 755 

I I I I I I I I I I I I I I I I I I I I I I III II II III I II I II II I I I I I I I I I 
Db 668 GTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCG 727 

Qy 756 GAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACC 815 

I II I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I II I I I I I 
Db 728 ACGAGTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACC 787 

Qy 816 CACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGC 875 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I II 
Db 7 88 CACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGC 847 

Qy 876 CAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCT 935 

III I I II I I I I I I I I I M I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I 

Db 848 CAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCT 907 

Qy 936 GTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCA 995 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 908 ATTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCA 967 

Qy 996 CATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGA 1055 

I I I I I I I I I I I I I I M MM III I II I M II II I M II II I II Mill II 

Db 968 AATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGA 1027 

Qy 1056 CTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAG 1115 

I I II I I I I I I I I I I I I II I M I II I I I II I I I I I III I III I II I II I I 

Db 1028 CTTCTACGTGGACTTGACCAGCATCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGT 1087 

Qy 1116 GGAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTT 1175 

II I II I I I I II I I I II I I II I II I I I I II I I I II II II I I III I I II I I I I 

Db 1088 GGAGAAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAA7\AGTACAAGGCTTTGATGACTT 1147 

Qy 1176 TCTATGG7VAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGGAAAGCAGCGTGAC 1235 

I I I I II I II I I II I I I I II I I I II I II I I III 

Db 1148 TCTGTGGAAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAGTCAGCCTGACCCT 1207 

Qy 1236 CCCACTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTT 1295 

MM II I II II I I I I I I I II II I I II I II I I I II II 

Db 1208 CACACAGGACACTGACTG TGGGACTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTT 1264 

Qy 1296 TACGACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCAT 1355 

I I II II II II II I I II M I I I I I I I M II II I II I I I II I II I II I II I II I I 
Db 1265 TTCCACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCAT 1324 

Qy 1356 CCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGG 1415 

M II II I II I I II I I I I I II II I I I III II I M II I II I II I I I I II 

Db 1325 TCATGGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGG 1384 

Qy 1416 GAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCAT 1475 

I II II I II II I II II II II I II I I I MUM I II II I II I M II II II I 
Db 1385 GGCCAAGCAGCTCTCCTTCATGGACACAGCAGCGCTCCTCTTCATGATAGGGGCGCTCAT 1444 



Qy 147 6 CCCTTTCAACGTCATTCTGGATGTCATCTCCT^TGTTACTCAGAGAGGGCAATGCTTTA 1535 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 14 45 TCCTTTCAATGTCATCCTGGATGTCGTCTCCAAATGTCACTCGGAGAGGTCAATGCTGTA 1504 

Qy 1536 CTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATCCTCGG 1595 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 1505 CTATGAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGG 1564 

Qy 1596 GGAGCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGC 1655 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I II I 

Db 1565 AGAATTGCCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGAC 1624 

Qy 1556 CAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGT 1715 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I 

Db 1625 AAACCTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGT 1684 

Qy 1716 CTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGC 1775 

I I I II I I I I I I I I I I II I I I I I I I III I Ml I I I I I I I M I I I I II I I I I I 
Db 1685 CTTCTGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTC 174 4 

Qy 1776 CTCCTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAA 1835 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I 
Db 1745 CTCCTTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAA 1804 

Qy 1836 CTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTG 1895 

I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I 

Db 1805 CTTGGACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTG 1864 

Qy 1896 TTTTGAAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAA 1955 

II I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I 

Db 1865 CTTCTCGGGGCTGATGCAGATTC7\ATTTAATGGACACCTTTACACCACACAAATCGGCAA 1924 

Qy 1956 CCTCACCATCGCGGTCTCAGGAGATA7\AATCCTCAGTGCCATGGAGCTGGACTCGTACCC 2015 

I I I I II M I II I I I I I I II I I I I I I I I I I I I I I M I I I I I I I I 

Db 1925 CTTCACCTTCTCCATCCTCGGAGACACGATGATCAGTGCCATGGACCTGAACTCGCATCC 1984 

Qy 2 016 TCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTA 2075 

I I I I I II I I I I I I I I I I I I I I I I III Mill II II M II I I I II II II I 
Db 1985 ACTCTATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTA 2044 

Qy 2076 CGTGTCCTTAAGGTTCATCAAACAGAAACCAAGTCAAGACTGGTGATTCACGCCAGACGT 2135 

I I M I I I I I M I II II II I I I I I I I I I I II I I I M I I I I I I I 
Db 2 045 TCTATCCTTGAAGCTCATCAAACAGAAGTCAATTCAAGACTGGTGATACTCAGCCTTGCT 2104 

Qy 2136 CTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACTGCACTCCCTCCTCAGGAGCCCC 2195 

I I I I I I I II I I II M I I I I 

Db 2105 CTCACTGGCGG GACCCTTTTCCCGGGGCTGGCCACCCCAGGAGGAGCC 2152 

Qy 2196 TTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCTCAGCTACATCCGGCCCAGGGTG 2255 

I I I I II I I I I Ml III Ml II MM II III 

Db 2153 GGACTGGGGACAAGGCTCACACAGATCTCTCAG GCAGCAGCCACCTCTTAGTG 2205 

Qy 2256 CTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAAT7\AAGACAGTCGAAAGGGATTT 2315 

I II I II I II II I II II I I I I II I I II I M I II I I II II I II I I M I II III Ml 
Db 2206 CTGCAGTGGCACAGGTCAGCCACAGGATGGCAGTAGAATAAAGACAGTTGAGAGGTGTTT 2265 

Qy 2316 CTGCTCACTGGCAGGAGACTGCGATGACTGGGAGAAAACCTGCACTCGGTGGCACCTACA 2375 



Db 2266 CTGCTCCCAGGCCCAGGCTTGTGATGGGAGAGAGAGAA ACCAGGT 2310 

Qy 2376 ACGTTGCTAATTTATTTCCTTTTGATATGCATTTATATAGGCAACTCGATATAGGATGGG 2435 

I I I I I I I I I I MM Ml M M M I M M I M I I M I M 

Db 2311 ACGTTGCTCATGCATTT TATATCTTTAAATAAACAACCCAGTATGGAATGGG 2362 

Qy 2436 AGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGCAGGAATTGTTGGAACCTGGAGGGA 24 95 

I I M M M M M M M M M M M M M I M M M M I I Mil 

Db 2363 AACCAATTATATATGAATTGAGTAGCTAGGCTATGCAGAAATTTCTGGAATCCTGAGAGG 2422 

Qy 24 96 ACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCAGGGGCCCCACACTCCGTGGTGA 2555 

III I I I I I I I II I II I I II II I II I I 

Db 2423 ATAGTGGTTTATAGCAAAGTGTTTAACTTTCTCTTCTACCATTCTCACAC TGTTAA 2478 

Qy 2556 GCCACCATCAATACAGAAAGTGACCTAAGATGTACCAGCAAGATG-CCATCCCTTCTTTT 2614 

Mill I I I II II I I I I M II I I I I I II II II I I I M I I I 

Db 2479 GCCACTCCCAATACAAAGGGCGACCTAAAACAAACTAGCAAAATGTTTTTCGCTTATCTC 2538 

Qy 2615 TGTGTGGGGTCATGGGCTCCAAAAGCCAACGT 2646 

II I I II I I I I I I I II I II M I I II 

Db 2539 TGCGTGGATTCATGGACTCCAACCCCCAAAGT 257 0 



RESULT 2 
AK050938 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



Craniata; Vertebrata; Euteleostomi ; 
Sciurognathi; Muridae; Murinae; Mus . 



AK050938 2417 bp mRNA linear HTC 20-SEP-2003 

Mus mus cuius 9 days embryo whole body cDNA, RIKEN full-length 
enriched library, clone: D030040P06 product :ATP-BINDING CASSETTE, 
SUB-FAMILY G, MEMBER 8 (STEROLIN-2) homolog [Mus mus cuius ] , full 
insert sequence. 
AK050938 

AK050938.1 GI: 26094211 
HTC; CAP trapper. 
Mus mus cuius (house mouse) 
Mus mus cuius 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodentia; 
1 

Carninci^P. and Hayashizaki, Y. 

High-efficiency full-length cDNA cloning 

Meth. Enzymol. 303, 19-44 (1999) 

99279253 

10349636 

2 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 

Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 

prepare full-length cDNA libraries for rapid discovery of new genes 

Genome Res. 10 (10), 1617-1630 (2000) 

20499374 

11042159 

3 

Shibata,K., Itoh,M. , Aizawa,K,, 
Konno,H., Akiyama,J., Nishi,K., 
Sumi,N., Ishii,Y., Nakamura,S., 



Nagaoka,S,, Sasaki, N., 
Kitsunai, T. , Tashiro, H. 
Hazama,M., Nishine,T., 



Carninci, P . , 
, Itoh,M., 
Harada, A. , 



Yamamoto,R., Matsumoto, H . , Sakaguchi, S . , Ikegami,T., Kashiwagi,K. 



Fujiwake,S., Inoue,K., Togawa,Y., Izawa^M., Ohara,E., Watahiki,M., 
Yoneda,Y., Ishikawa^T., Ozawa,K., Tanaka,T., Matsuura,S., Kawai,J., 
Okazaki,Y., Muramatsu,M. , Inoue,Y., Kira,A. and Hayashizaki, Y. 
TITLE RIKEN integrated sequence analysis (RISA) system — 384-format 

sequencing pipeline with 384 multicapillary sequencer 
JOURNAL Genome Res. 10 (11), 1757-1771 (2000) 
MEDLINE 20530913 
PUBMED 11076861 
REFERENCE 4 

AUTHORS The RIKEN Genome Exploration Research Group Phase II Team and the 

FANTOM Consortium. 
TITLE Functional annotation of a full-length mouse cDNA collection 

JOURNAL Nature 409, 685-690 (2001) 
REFERENCE 5 

AUTHORS The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

TITLE Analysis of the mouse transcriptome based on functional annotation 

of 60,770 full-length cDNAs 

JOURNAL Nature 420, 563-573 (2002) 
REFERENCE 6 (bases 1 to 2417) 

AUTHORS Adachi,J., Aizawa,K., Akimura,T., Arakawa,T., Bono,H., Carninci,P., 
Fukuda,S., Furuno,M., Hanagaki,T., Hara,A. , Hashizume, W. , 
Hayashida, K. , Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T., 
Hori,F., Imotani,K., Ishii,Y., Itoh,M., Kagawa,!., Kasukawa,T., 
Katoh,H., Kawai,J., Kojima,Y., Kondo,S., Konno,H., Kouda,M., 
Koya,S., Kurihara,C., Matsuyama, T . , Miyazaki,A., Murata,M., 
Nakamura,M., Nishi,K., Nomura, K., Numazaki,R., Ohno,M., Ohsato,N., 
Okazaki,Y., Saito,R., Saitoh, H., Sakai,C., Sakai,K., Sakazume,N., 
Sano,H., Sasaki, D., Shibata,K., Shinagawa,A. , Shiraki,T., 
Sogabe,Y., Tagami,M., Tagawa,A., Takahashi, F. , Takaku-Akahira, S . , 
Takeda,Y., Tanaka,T., Tomaru,A. , Toya,T., Yasunishi, A. , 
Muramatsu,M. and Hayashizaki, Y. 

TITLE Direct Submission 

JOURNAL Submitted (16-JUL-2001) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN) , Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan (E-mail : genome-res @gsc . riken . go , jp, 
URL :http: //genome. gsc.riken.go.jp/, Tel : 81-45-503-9222, 
Fax:81-45-503-9216) 
COMMENT cDNA library was prepared and sequenced in Mouse Genome 

Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site for further details. 
URL:http: //genome . gsc . riken. go. jp/ 
URLrhttp: / /fantom.gsc. riken. go. jp/ . 
FEATURES Location/Qualifiers 
source 1. .2417 

/organism="Mus mus cuius" 

/mol_type-"mRNA" 

/strain="C57BL/6J" 

/db_xref="FANTOM_DB:D030040P06" 

/db_xref="MGI: 2418860" 

/db_xref="taxon: 10090" 

/clone-"D03004 0P06" 



/tissue_type-"whole body" 

/clone_lib="RIKEN full-length enriched mouse cDNA library" 
/dev_stage="9 days embryo" 
misc feature 1. .2417 

/note="ATP-BINDING CASSETTE, SUB-FAMILY G, MEMBER 8 
(STEROLIN-2) homolog [Mus musculus] ( SWISS PROT | Q9DBM0, 
evidence: FASTY, 92%ID, 96.7%length, match=1796) " 

ORIGIN 

Query Match 48.2%; Score 1286.2; DB 11; Length 2417; 

Best Local Similarity 76.1%; Fred. No. le-286; 

Matches 1696; Conservative 0; Mismatches 483; Indels 50; Gaps 7; 

Qy 419 CAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAGGTCACGGCGGCAAGA 478 

I I I I II I I I I I I I I I II I II II I I I I I M I III II I I I I II I I I I I I I I 

Db 184 CAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACAGGCAGAGGCCACGGTGGCAAGA 2 43 

Qy 479 TCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGTGAGGAAGT 538 

I II I II II II II II I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 244 TGAAATCAGGACAAATTTGGATAAATGGGCAACCCAGTACGCCTCAGCTGGTGAGGAAGT 303 

Qy 539 GTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCT 598 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 304 GCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAACCTGACCGTCAGAGAGACCC 363 

Qy 599 TGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAA 658 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 

Db 364 TGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCCCAGGCCCAGCGTGACAAAC 423 

Qy 659 GGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCA 718 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I 
Db ' 424 GGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGCGCCAACACCAGAGTGGGCA 4 83 

Qy 719 ACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGC 778 

III III M II III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 484 ACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGAGTGAGCATTGGGGTGCAGC 543 

Qy 779 TCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCA 838 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I M I I I I I I I I I 

Db 544 TCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACTTCTGGCCTCGACAGCTTCA 603 

Qy 839 CAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCA 898 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 604 CAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAGGGCAACAGGCTGGTGCTCA 663 

Qy 899 TCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGA 958 

I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 664 TCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTTGACCTGGTCCTTCTGATGA 723 

Qy 959 CGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCA 1018 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 724 CATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATGGTGCAGTACTTCACATCCA 783 

Qy 1019 TCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACCAGCA 1078 

I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 784 TTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTCTACGTGGACTTGACCAGCA 843 



Qy 1079 TTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAG7\AGGCTCAGTCACTCGCAG 1138 

I I I I I I I I I I I I I III I III II I I II I I I I I I I I II I I II I I II MM 

Db 844 TCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGTGGAGAAGGCACAGTCTCTTGCAG 903 

Qy 1139 CCCTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGG 1198 

II I II II I II II I II I I I I I III II II II II II I II M II M III I II I I 

Db 904 CCCTGTTCCTAGA7\AAAGTACAAGGCTTTGATGACTTTCTGTGGAAAGCTGAGGCAAAGG 963 

Qy 1199 ATCTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCCACTAGACACC7\ACTGCCTCC 1258 

III II I II I I III I II I II I II MM 

Db 964 AACTCAACACAAGCACCCACACAGTCAGCCTGACCCTCACACAGGACACTGACTG TG 102 0 

Qy 1259 CGAGTCCTACGi\AGATGCCTGGGGCGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGA 1318 

I I I I I II II I I I I I II I II I I I I I I I I II II II M II I I M I I 

Db 1021 GGACTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTTTTCCACCCTGATCCGTCGTCAGA 1080 

Qy 1319 TTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGA 1378 

II II I I I II II II M I I II I I II I I I II I I I II Mini MM II II I MM 

Db 1081 TTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCATGGGTCGG7\AGCCTGCCTGA 1140 

Qy 1379 TGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGG 1438 

II I I I I M I II I I I I I I II I II II II II I II M I I M I II I II II II 
Db 1141 TGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCCAAGCAGCTCTCCTTCATGG 1200 

Qy 1439 ATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATG 149 8 

I I II II I I II I I I I I I II I I I M II I II I I II I II I I I II I I I II I Ml I 

Db 1201 ACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCTTTCAATGTCATCCTGGATG 1260 

Qy 14 99 TCATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTACTATGAACTGGAAGACGGGCTGT 1558 

II I I I I I I I II I I I I II II I II I II II M I I II M I I I II I I II I I I I I I II II 

Db 1261 TCGTCTCCAAATGTCACTCGGAGAGGTCAATGCTGTACTATGAGCTGGAAGACGGGCTGT 132 0 

Qy 1559 ACACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCT 1618 

I I I I II I I II I M I II I I I II I II II I I II I II II I I II II I I II I II I II I 

Db 1321 ACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAATTGCCGGAGCACTGTGCCT 1380 

Qy 1619 ACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGC 167 8 

II I II II I I II II I I I I II I I I I I II II II II I II I I I I II II I I III 

Db 1381 ACGTCATCATCTACGCGATGCCCATCTACTGGCTGACAAACCTGCGGCCCGTGCCTGAGC 144 0 

Qy 1679 CCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCC 1738 

I II II I II I M I I I II M I II I II I M I II I II II II I II I II I I I II I I II 

Db 1441 TCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGCTGCAGGACCATGGCCC 1500 

Qy 1739 TGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCT 1798 

II II III I III II I I I I I II I II II I II I I I II I II II II I II I II I I I I I I I 

Db 1501 TGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTCTTCTGCAATGCCCTCT 1560 

Qy 1799 ACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGC 1858 

II M II II I I I I II I I I I II I I M II I I II II I II I M I I II II II Mill 
Db 1561 ACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTGGACAACCTGTGGATAGTGC 1620 

Qy 1859 CCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTC 1918 

I II I II I I II M I I I I I II I I I I I I I II II M II I I II I I I M I I I I 

Db 1621 CTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCGGGGCTGATGCAGATTC 1680 

Qy 1919 AGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAG 1978 



I I I I II I I I I I I I I I I I I I I I I I I I I I M I I I I 

Db 1681 AATTTAATGGACACCTTTACACCACACAAATCGGCAACTTCACCTTCTCCATCCTCGGAG 174 0 

Qy 1979 AT7\AAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCG 2038 

II II I II I II II I II II I II II II I I II I I II I I I I I II I M II I I I 

Db 1741 ACACGATGATCAGTGCCATGGACCTGAACTCGCATCCACTCTATGCGATCTACCTCATTG 1800 

Qy 2039 TCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATC7WVC 2098 

I I I I I II I I I I I M I I I I I I I I I II I I II I I I I I I I I I I I I II I I I I 
Db 1801 TCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTATCCTTGAAGCTCATCAAAC 1860 

Qy 2099 AGAAACCAAGTCAAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGA 2158 

I I I I I I I I I I I II I I I I I I I I I I I III II M 
Db 1861 AGAAGTCAATTCAAGACTGGTGATACTCAGCCTTGCTCTCACTGGCGG 1908 

Qy 2159 GCAGACCCTTC7\ACTGCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAAT 2218 

I II II I I I I I I I I I M I M I I I I III 

Db 1909 GACCCTTTTCCCGGGGCTGGCCACCCCAGGAGGAGCCGGACTGGGGACAAGGCTCACACA 1968 

Qy 2219 GACCCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCAC 227 8 

III III II MM II II I I I I I II II II II II I I I II I I 

Db 1969 GATCTCTCAG GCAGCAGCCACCTCTTAGTGCTGCAGTGGCACAGGTCAGCCAC 2021 

Qy 227 9 AGGATGGCAGTAGAATAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCG 2338 

II I II II I II I I M I I II II II I I I II III II II I II I I I III I II I 

Db 2022 AGGATGGCAGTAGAATAAAGACAGTTGAGAGGTGTTTCTGCTCCCAGGCCCAGGCTTGTG 2081 

Qy 2339 ATGACTGGGAGAAAACCTGCACTCGGTGGCACCTAC7^ACGTTGCT7\ATTTATTTCCTTTT 2398 

III I I II I II III I I II I I II II I II I 

Db 2082 ATGGGAGAGAGAGAA ACCAGGTACGTTGCTCATGCATTT 2120 

Qy 2399 GATATGCATTTATATAGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGT 2458 

III I I I I I I I I I II I I II I I II II I I II II II I I II M I I 
Db 2121 — TATATCTTTAAATAAACAACCCAGTATGGAATGGG7\ACCAATTATATATGAATTGAGT 217 8 

Qy 2459 AGCTAGACTGTGCAGGAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATT 2518 

II I II I I I I II I I I II I II I M I II I I I I I I II 

Db 2179 AGCTAGGCTATGCAGAAATTTCTGGAATCCTGAGAGGATAGTGGTTTATAGCAAAGTGTT 2238 

Qy 2519 TGGCTTCATCTTCCAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGA 2578 

I III II I I I I I II M I II I I I I I I I II I I II I I III 

Db 2239 TAACTTTCTCTTCTACCATTCTCACAC TGTT7U\.GCCACTCCCAATACAAAGGGCGA 2294 

Qy 2579 CCTAAGATGTACCAGCAAGATG-CCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAA 2637 

I II I I I I I II I II II I II I I I I I I I II II II I M I II M II 

Db 2295 CCTAAAACAAACTAGCAAAATGTTTTTCGCTTATCTCTGCGTGGATTCATGGACTCCAAC 2354 

Qy 2638 AGCCAACGT 264 6 

II II II 

Db 2355 CCCCAAAGT 2363 



RESULT 3 
BX481838 

LOCUS BX481838 691 bp mRNA linear EST 04-SEP-2003 

DEFINITION DKFZp686M06227_rl 686 (synonym: hlcc3) Homo sapiens cDNA clone 
DKFZp686M06227 5', mRNA sequence. 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGTVNISM 



REFERENCE 
AUTHORS 

TITLE 
J0URN7VL 
COMMENT 



BX481838 

BX4 81838. 1 01:31941164 
EST, 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 (bases 1 to 691) 
Bahr,A., Lauber,J 



FEATURES 

source 



Craniata; Vertebrata; Euteleostomi ; 
Catarrhini; Hominidae; Homo, 

Weil,B,, Amid,C,^ Osanger,A,, 



Mewes,H.W, 
Fobo,G., Han,M. and Wiemann,S. 

EST (Bahr,A,, Lauber,J,, Mewes,H.W., Weil,B., et al . ) 
Unpublished (2003) 
Contact: MIPS 
MIPS 

Ingolstaedter Landstr.l, D-85764 Neuherberg, Germany 
This is the 5' sequence of the clone insert 

Clone from S. Wiemann, Molecular Genome Analysis, German Cancer 
Research Center (DKFZ) ; Email s .wiemann (§dkfz- heidelberg.de; 
sequenced by Qiagen (Hilden/ Germany) within the cDNA sequencing 
consortium of the German Genome Project. 
No si sequence available. 

This clone (DKFZp686M06227 ) is available at the RZPD in Berlin. 
Please contact the RZPD: Ressourcenzentrum, Heubnerweg 6, 14059 
Berlin- Charlottenburg, GERMANY; Email: clone@rzpd.de. 

Location/Qualif iers 

1. .691 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref-"taxon: 9606" 
/clone="DKFZp686M06227" 
/dev_stage=" adult" 
/lab_host="DHlOB" 

/clone_lib-"686 (synonym: hlcc3)" 

/note='"Vector : pTriplEx2; Site_l: SfilA; Site_2 : SfilB; 
cDNA-collection" 



ORIGIN 



Query Match 25.5%; 
Best Local Similarity 99.7%; 
Matches 683; Conservative 



Score 681.8; DB 13; 
Pred. No. 7.5e-147; 
0; Mismatches 2; 



Length 691; 
Indels 0; 



Gaps 



0; 



Qy 


1742 


Db 


7 


Qy 


1802 


Db 


67 


Qy 


1862 


Db 


127 


Qy 


1922 


Db 


187 



CCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACA 1801 

I III I I I II II I I M I I M I I I I II II I I I I I I I II I II I I I II II I II I I I I II M I 

CGGCCACGGCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACA 66 

ACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCCG 1861 

II I I I I I I I I I I I II I I I II I I I I I I I I M I I I I I I I I II II I II I I I I I I I II II II I I 
ACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCCG 126 

CGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAGT 1921 
I I I I I I I I I I I I I I I I I I I I II I I I I I I I M I I I I I I I I I I I I I I I I I I II I I I I I I I II 
CGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAGT 186 

TCAGCAGAAGAACTTATAAAATGCCTCTCGGGT^CCTCACCATCGCGGTCTCAGGAGATA 1981 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I I II II I I I I M I I I I I I I I I I I I I I I I I I I I 

TCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGATA 246 



Qy 1982 AAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCA 2 041 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I M I I I I I I I 
Db 247 AAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCA 306 

Qy 2042 TTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGA 2101 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 307 TTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGA 366 

Qy 2102 AACCAAGTCAAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCA 2161 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I I I I I I I I I 
Db 367 AACCAAGTCAAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCA 426 

Qy 2162 GACCCTTCAACTGCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGAC 2221 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 427 GACCCTTCAACTGCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGAC 486 

Qy 2222 CCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGG 2281 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I M I I 
Db 487 CCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGG 546 

Qy 22 82 ATGGCAGTAGAATAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATG 2341 

I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 547 ATGGCAGTAGT^ATAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATG 606 

Qy 2342 ACTGGGAGAAAACCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGAT 2401 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
Db 607 ACTGGGAGAAAACCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGAT 666 

Qy 24 02 ATGCATTTATATAGGCAACTCGATA 2426 

I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 667 ATGCATTTATATAGGCAACTCGATA 691 



RESULT 4 
BI330745 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



BI330745 849 bp itiRNA linear EST 30-JUL-2001 

602982409F1 NCI_CGAP_Li9 Mus musculus cDNA clone IMAGE: 5135115 5*, 
itiRNA sequence. 
BI330745 

BI330745.1 GI : 150154 02 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 849) 
NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian Gene Collection (MGC) 

Unpublished (1999) 

Contact: Robert Strausberg, Ph.D. 

Email: cgapbs-r@mail.nih.gov 

Tissue Procurement: Jeffrey E. Green, M.D. 

cDNA Library Preparation: Life Technologies, Inc. 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 

DNA Sequencing by: Incyte Genomics, Inc. 

Clone distribution: MGC clone distribution information can be 
found through the I.M.A.G.E. Consortium/LLNL at: 



http : //image . llnl . gov 
Plate: LLAM11332 row: a column: 04 
High quality sequence stop: 758, 
FEATURES Location/Qualifiers 
source 1. .849 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="FVB/N" 

/db_xref="taxon: 10090" 

/clone="IiyLAGE: 5135115" 

/lab_host="DH10B (Tl phage-resistant) " 
/ clone_lib= "NCI_CGAP_Li 9 " 

/note="Organ: liver; Vector: pCMV-SP0RT6; Site_l: NotI; 
Site_2: Sail; Cloned unidirectionally . Primer: Oligo dT . 
Average insert size 1.9 kb. Constructed by Life 
Technologies. Note: this is a NCI_CGAP Library." 

ORIGIN 

Query Match 17.3%; Score 460.6; DB 12; Length 849; 

Best Local Similarity 77.3%; Pred. No. 1.7e-95; 

Matches 639; Conservative 0; Mismatches 174; Indels 14; Gaps 6; 

Qy 991 CAGCACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCT 1050 

I I I I I I I I I I Mill I I I I I I MM III I I I I I II I I II I II I Mill III 

Db 2 CAGCA7VATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCT 61 

Qy 1051 GCTGACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCC 1110 

II II II II II I I I I II I I I II I I II I II II I II II I II III I III I II II 

Db 62 GCGGACTTCTACGTGGACTTGACCAGCATCGACAGACGCAGCAAAGAACGGGAGGTGGCC 121 

Qy 1111 ACCAGGGAG7\AGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTGACTTAGAT 1170 

III II I II II II I I I I I II I I I I II I I I I I I I I I II I II I I I I I II I I I 

Db 122 ACCGTGGAGAAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGAT 181 

Qy 1171 GACTTTCTATGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGGAAAGCAGC 1230 

I II I I M I I I I M I II I II I II I II II II II II I III 

Db 182 GACTTTCTGTGGAAAGCTGAGGCAAAGGAACTCAACACj^AGCACCCACACAGTCAGCCTG 241 

Qy 1231 GTGACCCCACTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAG 1290. 

I II I I I II I I I I I I I I I I II I I II II I II II 

Db 242 ACCCTCACACAGGACACTGACTG TGGGACTGCTGTTGAGCTGCCCGGGATGATAGAG 298 

Qy 12 91 CAGTTTACGACGCTGATCCGTCGTCAGATTTCCT^ACGACTTCCGAGACCTGCCCACCCTC 1350 

I I II II I II I II I II I I I I II I I II II I I I I I I I I II I I I I I II I I I I II I II 
Db 299 CAGTTTTCCACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTG 358 

Qy 1351 CTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGC 1410 

I I I II M II II I II I M I M I II II M I I I II I II II I II I II I III 

Db 359 CTCATTCATGGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGC 418 

Qy 1411 CATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCT 1470 

I I I II I II II II I II II I I I II II I II II I II I II I I I I II I I II II II 
Db 419 CATGGGGCCAAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCG 478 

Qy 1471 CTCATCCCTTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCAATG 1530 

I I I II I I I II II I I I I II I I II II I II II II II II I II I II I II I II I I II I I 
Db 47 9 CTCATTCCTTTCAATGTCATCCTGGATGTCGTCTCCAAATGTCACTCGGAGAGGTCAATG 538 



Qy 1531 CTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATC 1590 

II I I I I II I I II I I I I I M I II II I I II I I M I II M M I I I I I I I I I I I I I I 
Db 539 CTGTACTATGAGCTGGAAGACGGGCTGT — ACTGCTGGTCCTTATTTCTTTGCCAAGATC 596 

Qy 1591 CTCGGGGAGCTTCCGGAGCAC-TGTGCCTACATCATCATCTACGGGATGCCCACCTACTG 1649 

II II II I I I II M I I I I II I I I I I I I I I I I I I I I II I II I I I I I I I II I II 
Db 597 CTAGGAGAATTGCCGGAGCACTTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTG 656 

Qy 1650 GCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGC — ACTTCCTGCTGGTGTGG 1707 

I I I I I I I II I I M I I I I I I I I I I II I I I I I I M I I I I I I I I I I 

Db 657 GCTGACAAACCTGCGGCCCGTGCCTGAGCTCTTCCTTCTACCACTTTCCTGCTCGTGTGG 716 

Qy 1708 CTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTC 17 67 

I I I I I I I I I I I I I I M M I I I I I I I III I I I I I I I 

Db 717 TAGGAGGTCTTCTGCTGCAGGACATGGCCTTGGTGCTCTGCCATGCTG CCAACTTC 772 

Qy 1768 CACATGGCCTCCTTCTTCAGCTyVTGCCCTCTACAACTCCTTCTACCT 1814 

I I I I I I I I I I I II II I I III II I I I I II I I M I I I I I 

Db 773 CACATGTCCTCCTTCTTCTGCA — TGCCTCTTAGAATCCTTCTACCT 817 



RESULT 5 
BF660076 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
COMMENT 



FEATURES 

source 



BF660076 549 bp mRNA linear EST 20-DEC-2000 

maa27c08.yl NCI_CGAP_LilO Mus musculus cDNA clone IMAGE: 3812342 5' 
similar to TR:Q9VQN4 Q9VQN4 CG9664 PROTEIN. itiRNA sequence. 
BF660076 

BF660076. 1 GI : 11925210 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 549) 

NCI-CGAP http : //www. ncbi . nlm. nih . gov/ ncicgap . 

National Cancer Institute, Cancer Genome Anatomy Project (CGAP) , 

Tumor Gene Index 

Unpublished (1997) 

Other_ESTs: maa27c08.xl 

Contact: Robert Strausberg, Ph.D. 

Email : cgapbs-r@mail . nih . gov 

Tissue Procurement: Jeffrey E. Green, M.D. 
cDNA Library Preparation: Life Technologies, Inc. 
cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 
DNA Sequencing by: Washington University Genome Sequencing Center 
Clone distribution: NCI-CGAP clone distribution information can be 

found through the I.M.A.G.E. Consortium/LLNL at: 

http : //image . llnl . gov 

MGI: 1454454 

Seq primer: -40RP from Gibco 
High quality sequence stop: 435. 

Location/Qualifiers 

1. .549 

/organism="Mus musculus" 
/mol_type="mRNA" 



/db__xref="taxon: 10090" 
/clone="IMAGE: 3812342" 
/sex=" female" 
/dev_stage="10 weeks" 

/lab_host-"DH10B (Tl phage-resistant) " 
/ cl one_l ib= "NCI_CGAP_Li 10" 

/note="Organ: liver; Vector: pCMV-SP0RT6; Site_l: NotI; 
Site_2: Sail; Cloned unidirectionally. Primer: OligodT. 
Average insert size 1.6 kb. Library constructed by Life 
Technologies . " 

ORIGIN 

Query Match 13.9%; Score 370.4; DB 10; Length 549; 

Best Local Similarity 79.7%; Pred. No. l.le-74; 

Matches 437; Conservative 0; Mismatches 111; Indels 0; Gaps 0; 

Qy 1565 CTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTACATCA 1624 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II M I I M I II M I I I I II I I I I III 
Db 2 CTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAATTGCCGGAGCACTGTGCCTACGTCA 61 

Qy 1625 TCATCTACGGGATGCCCACCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCC 1684 

I II I I I I II I II I II II I I II I I I II I I II II II II II I I I I I II I I I 
Db 62 TCATCTACGCGATGCCCATCTACTGGCTGACAAACCTGCGGCCCGTGCCTGAGCTCTTCC 121 

Qy 1685 TGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCG 1744 

I II II I I I I I I I I I I I I I II II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 

Db 122 TTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTCTGCTGCAGGACCATGGCCCTGGCTG 181 

Qy 1745 CCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACT 18 04 

II I III I I I I I I I I II I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I 

Db 182 CCTCTGCCATGCTGCCCACCTTCCACATGTCCTCCTTCTTCTGCAATGCCCTCTACAACT 241 

Qy 1805 CCTTCTACCTCGCCGGGGGCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCCGCGT 1864 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I II I I I I I I I I I I I I I I II I 

Db 242 CCTTCTACCTTACTGCCGGCTTCATGATAAACTTGGACAACCTGTGGATAGTGCCTGCAT 301 

Qy 1865 GGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAGTTCA 1924 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 302 GGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTCTCGGGGCTGATGCAGATTCAATTTA 361 

Qy 1925 GCAGAAGAACTTATAAAATGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGATA7\7\A 1984 

II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 

Db 362 ATGGACACCTTTACACCACACAAATCGGCAACTTCACCTTCTCCATCCTCGGAGACACGA 421 

Qy 1985 TCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTG 2044 

I 1111111111111 III I I I I I I II I I I II II I I I I I I I I I I I I I I I I I 

Db 422 TGATCAGTGCCATGGACCTGAACTCGCATCCACTCTATGCGATCTACCTCATTGTCATCG 481 

Qy 2045 GCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGAAAC 2104 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
Db 4 82 GCATCAGCTACGGCTTCCTGTTCCTGTACTATCTATCCTTGAAGCTCATCAAACAGAAGT 541 

Qy 2105 CT^GTCAA 2112 

III I I I I 

Db 542 CAATTCAA 54 9 



RESULT 6 
BY705076 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGT^NISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



Craniata; Vertebrata; Euteleostomi; 
Sciurognathi ; Muridae; Murinae; Mus . 



BY705076 583 bp luRNA linear EST 16-DEC-2002 

BY705076 RIKEN full-length enriched, adult male liver Mus musculus 
cDNA clone 1300003C16 5', mRNA sequence. 
BY705076 

BY705076.1 GI: 27116215 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodentia; 
1 (bases 1 to 583) 

Okazaki,Y., Furuno,M., Kasukawa,T., Adachi,J., Bono,H., Hondo, S., 
Nikaido,!., Osato,N., Saito,R., Suzuki, H., Yamanaka,I., 
Kiyosawa,H., Yagi,K., Tomaru,Y., Hasegawa,Y., Nogami,A. , 
Schonbach,C. , Gojobori,T., Baldarelli, R. , Hill, D. P., Bult,C., 
Hume, D. A., Quackenbush, J. , Schriml, L -M. , Kanapin,A., Matsuda,H., 
Batalov,S., Beisel,K.W., Blake, J. A., Bradt,D., Brusic,V., 
Chothia,C., Corbani, L. E . , Cousins, S., Dalla,E., Dragani, T . A. , 
Fletcher, C. F. , Forrest, A. , Frazer, K. S . , Gaasterland, T . , 
Gariboldi,M. , Gissi,C., Godzik,A., Gough,J., Grimmond,S., 
Gustincich, S. , Hirokawa,N., Jackson, I . J. , Jarvis,E.D., Kanai,A., 
Kawaji,H., Kawasawa,Y., Kedzierski, R.M. , King,B.L., Konagaya,A. , 
Kurochkin, I .V. , Lee,Y,, Lenhard,B., Lyons, P. A., Maglott , D . R. , 
Maltais,L., Marchionni, L. , McKenzie,L., Miki,H., Nagashima, T . , 
Numata,K., Okido,T., Pavan,W.J., Pertea,G., Pesole,G., 
Petrovsky,N. , Pillai,R., Pontius, J. U. , Qi,D., Ramachandran, S . , 
Ravasi,T., Reed, J. C, Reed, D. J., Reid,J., Ring,B.Z., Ringwald,M., 
Sandelin,A., Schneider , C . , Semple,C.A., Setou,M., Shimada,K., 
Sultana, R., Takenaka,Y., Taylor,M.S., Teasdale, R. D . , Tomita, M. , 
Verardo,R., Wagner, L., Wahlestedt, C . , Wang,Y., Watanabe,Y., 
Wells, C, Wilming,L.G. , Wynshaw-Boris , A. , Yanagisawa,M. , Yang, I., 
Yang,L., Yuan,Z., Zavolan,M,, Zhu,Y., Ziitimer,A. , Carninci,P., 
Hayatsu,N. , Hirozane-Kishikawa, T . , Konno,H. , Nakamura,M. , 
Sakazume,N., Sato,K., Shiraki,T., Waki,K., Kawai,J., Aizawa,K., 
Arakawa,T., Fukuda,S., Kara, A., Hashizume, W . , Imotani,K., Ishii,Y., 
Itoh,M., Kagawa,I., Miyazaki,A. , Sakai,K., Sasaki, D., Shibata,K., 
Shinagawa, A. , Yasunishi,A. , Yoshino,M., Waterston, R. , Lander, E.S., 
Rogers, J., Birney,E. and Hayashizaki, Y. 

Analysis of the mouse transcriptome based on functional annotation 

of 60,770 full-length cDNAs 

Nature 420, 563-573 (2002) 

22354683 

12466851 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 
Sciences Center (GSC), Yokohama Institute 
The Institute of Physical and Chemical Research (RIKEN) 
1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, 
Tel: 81-45-503-9222 
Fax: 81-45-503-9216 
Email : genome-res @gsc. riken. go. jp, 
URL : http : / / genome . gsc . riken , go . j p/ 

Adachi,J., Aizawa,K., Akimura,T., Arakawa,T., Carninci,P., 
Fukuda,S., Hashizume, W. , Hayashida, K. , Hirozane,T., Hori,F,, 
Imotani,K., Ishii,Y., Itoh,M. , Kagawa,I., Kawai,J., Kojima,Y. 



Japan 



Kondo^S., Konno,H., Koya,S., Miyazaki,A., Murata^M., Nakamura,M., 
Nomura, K., Numazaki.R.^ Ohno,M. , Ohsato,N., Saito,R., Sakazume, N . , 
Sano,H., Sasaki, D., Sato,K., Shibata,K., Shiraki,T., Tagami,M., 
Takeda,Y., Waki,K., Watahiki,A., Muramatsu,M, and Hayashizaki, Y. 
Direct Submission 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences Mamm. Genome. 12, 673-677 (2001) 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. 10 (10), 1617-1630 (2000) 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. 11 (2), 281-289 (2001) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 
FEATURES Location/Qualifiers 
source 1. .583 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6J" 

/db_xref="taxon: 10090" 

/clone="1300003C16" 

/sex-"male" 

/tissue_type="liver" 

/dev_stage-"adult" 

/clone_lib=" RIKEN full-length enriched, adult male liver" 

ORIGIN 

Query Match 13.5%; Score 361.4; DB 13; Length 583; 

Best Local Similarity 82.7%; Pred. No. 1.4e-72; 

Matches 426; Conservative 0; Mismatches 86; Indels 3; Gaps 1 

Qy 99 CATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATAC 158 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 

Db 68 CATGGCTGAGAAAACCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTTCAGGATGC 127 

Qy 159 CTC GGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCAC 215 

II M II II II II II II II I I I II I I I I I I I I II I I I I I I I I I M I I II I I I I 
Db 128 TTCGCAGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCAC 187 

Qy 216 CTACAGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGC 275 

II I I I I I I I III I I I I II I I II M II I I I I I I I I II I I M I I I I I I I I I I I II 

Db 188 CTACAGTGGTCAGTCCAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACATCGC 247 

Qy 276 CTCTCAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAG 335 

II I I I II I I I I I I I I I II I I M I I I I I I I I I I I I I I I I I I I I II I II I I I II II 
Db 248 CTCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAG 307 



Qy 



336 CTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCA 395 



I I I I I I II M I I II I I I I I I I II I II II I I I I I I I I I I I I I I I II I I I II 

Db 308 CAGCCAAGACTCCTGTGAGCTGGGCATCCGAAATCTAAGCTTCAAAGTGAGGAGTGGACA 367 

Qy 396 GATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCAC 455 

I I II I I I I I I I I I I II I I I I I I I I I I I II I I M I I I I I I I I II II II I II I II 
Db 368 GATGCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCAC 427 

Qy 456 TGGCCGAGGTCACGGCGGCAAGATCTU^GTCAGGCCAGATCTGGATCAATGGGCAGCCCAG 515 

III I I I I I I I I I I II I I I I I II I I I I I II II I I I I I I I I I I I II I I I I I 
Db 428 AGGCAGAGGCCACGGTGGCAAGATGAAATCAGGACAAATTTGGATAAATGGGCAACCCAG 487 

Qy 516 CTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCC 575 

I I I II I I I I I I I I I I I I I I I I I II II II I I I II I I I I I I I I I I I I I I II 
Db 4 88 TACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCC 547 

Qy 576 CAACTTGACTGTGCGAGAGACCTTGGCCTTCATTG 610 

I I I I I I I I II I I I I I II I I I I I I I II I I I 
Db 548 CAACCTGACCGTCAGAGAGACCCTGGCTTTCATTG 582 



RESULT 7 
T91380/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



T91380 457 bp mRNA linear EST 22-MAR-1995 

yd53b02.sl Scares fetal liver spleen INFLS Homo sapiens cDNA clone 
IMAGE: 111915 3*, mRNA sequence. 
T91380 

T91380.1 GI:723293 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo, 
1 (bases 1 to 457) 

Hillier,L.^ Clark, N., Dubuque, T., Elliston,K., Hawkins, M., 
Holman,M., Hultman,M., Kucaba,T,, Le,M., Lennon,G., Marra,M., 
Parsons, J., Rifkin,L., Rohlfing,T., Scares, M., Tan,F., 
Trevaskis, E. , Waterston, R. , Williamson, A. , Wohldmann,P. and 
Wilson, R. 

The WashU-Merck EST Project 
Unpublished (1995) 
Contact: Wilson RK 

Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 . 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: est@watson.wustl.edu 

Insert Size: 827 

High quality sequence stops: 379 Source: IMAGE Consortium, LLNL 

This clone is available royalty-free through LLNL ; contact the 

IMAGE Consortium (inf o@image • llnl . gov) for further information. 

Insert Length; 827 Std Error: 0.00 

Seq primer: -21ml3 

High quality sequence stop: 379. 

Location/ Qualifiers 

1. .457 

/organism="Homo sapiens" 
/mol_type="mRNA" 



/db_xref-"GDB: 467532" 
/db_xref="taxon: 9606" 
/clone="IMAGE: 111915" 
/ sex="niale" 

/dev_stage="20 week-post conception fetus" 
/lab_host="DHlOB {ampicillin resistant)" 
/clone_lib="Soares fetal liver spleen INFLS" 
/note="Organ: Liver and Spleen; Vector: pT7T3D (Pharmacia) 
with a modified polylinker; Site_l: Pac I; Site_2 : Eco RI; 
1st strand cDNA was primed with a Pac I - oligo(dT) primer 
[ 5 * AACTGGAAGT^TTAATTAAAGATCTTTTTTTTTTTTTTTTTTT 3 , 
double-stranded cDNA was ligated to Eco RI adaptors 
(Pharmacia), digested with Pac I and cloned into the Pac I 
and Eco RI sites of the modified pT7T3 vector. Library 
went through one round of normalization. Library 
constructed by Bento Soares and M.Fatima Bonaldo." 



ORIGIN 



Query Match 13.3%; Score 355.8; DB 14; Length 457; 

Best Local Similarity 96.1%; Pred. No. 2,4e-71; 

Matches 438; Conservative 0; Mismatches 10; Indels 8; Gaps 7; 

Qy 2221 CCCTACAGATGCTCAG-CTACATCCGG—CCCAGGGTGCTGCAGTGGCACAGA-CCAGCC 2276 

I I I I I I I I I I I I I I I I I I I I I I I I I I III II II II II II I I I II I I I I I I I I 

Db 456 CCCTACAGATGCTCAGCCTACATCCGGCCCCCGGGTGCCTGCAGTGGCACAGACCCAGCC 397 

Qy 2277 ACAGGATGGCAGTAGAATAAAGACAGTCGAAAGGGATTT-CTGCTCACTGGCAGGAGA-C 2334 

' I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I II I I I I I I I I 

Db 396 ACAGGATGGCAGTAGAAT7\AAGACAGTCGAAAGGGATTTCCTGCTCACTGGCAGGAGANC 337 

Qy 2335 TGCGATGACTGGGAGAAAACCTGCACTCGGTGGCA-CCTACAACGTTGCTAATTTATTTC 2393 

I I I I I II I I I II I I I I I I I I I II I I I I I I I I M I I I II I I I II I I I I II M I I I I I I I 
Db 336 TGCGATGACTGGGAGA7VAACCTGCACTCGGTGGCNCCCTACAACGTTGCTAATTTATTTC 277 

Qy 2394 CTTTTGATATGCATTTATATAGGCAACTCGATATAGGATGGGAGCAAACTAGG7\ATG7\AT 2453 

I I I I M I I I M I I I II I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I M 
Db 276 CTTTTGATATGCATTTATATAGGCAACTCGATATAGGATGGGAGC7U\ACTAGGAATG7\AT 217 

Qy 2454 TGGGTAGCTAGACTGTGCAGGAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGC 2513 

I II I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
Db 216 TGGGTAGCTATACTGTGCAGGAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGC 157 

Qy 2514 AGATTTGGCTTCATCTTCCAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAA 2573 

I I I I M I I I I II I I I I I I I I I I I I I II I I I I II I I M I I I I I I I I I M I I I I I I I I I 
Db 156 AGATTTGGCTTCATCTTCCAGGGGCNCCCAACTCCGTGGTGAGCCACCATCAATACAGAA 97 

Qy 2574 AGTGACCT7\AGATGTACCAGCAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTC 2633 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 96 AGTGACCTAAGATGTACCAGCAAGATGCCATCCCTTCTTTTTGTGTGGGGTCAT-GGCTC 38 

Qy 2 634 CAAAAGCCTVACGTGAACAATTAAAAATGTATTGAGC 2669 

I I M I I I I I M I I I I M I I I I I I I I I I I I I I I I I I 

Db 37 CAAAAGCCAACGTGANCAATTAAAAATGTATTGAGC 2 



RESULT 8 
BX482362 



LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

0RG7VNISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



Craniata; Vertebrata; Euteleostomi ; 
Catarrhini; Hominidae; Homo. 



BX482362 334 bp mRNA linear EST 04-SEP-2003 

DKFZp686F02230_rl 686 (synonym: hlcc3) Homo sapiens cDNA clone 
DKFZp686F02230 5*, mRNA sequence. 
BX482362 

BX482362. 1 GI: 31942182 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 (bases 1 to 334) 

Koehrer,K., Beyer, A., Mewes,H.W., Weil,B., Amid.C, Osanger,A., 
Fobo,G., Han,M. and Wiemann^S. 

EST (Koehrer,K., Beyer, A., Mewes,H.W., Weil,B., Amid,C., et al . ) 
Unpublished (2003) 
Contact: MIPS 
MIPS 

Ingolstaedter Landstr.l, D-85764 Neuherberg, Germany 
This is the 5 ' sequence of the clone insert 

Clone from S. Wiemann, Molecular Genome Analysis, German Cancer 
Research Center (DKFZ); Email s . wiemann@dkf z- heidelberg.de; 
sequenced by BMFZ (Biomedical Research Center at the Heinrich- 
Heine-University, Duesseldorf /Germany) within the cDNA sequencing 
consortium of the German Genome Project. No si sequence available. 
This clone (DKFZp686F02230) is available at the RZPD in Berlin. 
Please contact the RZPD: Ressourcenzentrum, Heubnerweg 6, 14059 
Berlin-Charlottenburg, GERMANY; Email: clone@rzpd.de. 

Location/ Qualifiers 

1. .334 

/organism="Homo sapiens" 
/mol_type= "mRNA" 
/db_xref="taxon:9606" 
/clone="DKFZp68 6F02230" 
/dev_stage="adult" 
/lab_host="DH10B" 
/clone_lib-"686 (synonym: 
/note="Vector : pTriplEx2; 
cDNA-collection" 



hlcc3) " 

Site 1: SfilA; Site 2: 



SfilB; 



ORIGIN 



Query Match 12.5%; 
Best Local Similarity 99.7%; 
Matches 333; Conservative 



Score 332.4; DB 13; Length 334; 
Pred. No. 5.3e-66; 
0; Mismatches 1; Indels 0; 



Gaps 



0; 



Qy 1291 CAGTTTACGACGCTGATCCGTCGTCAGATTTCC7\ACGACTTCCGAGACCTGCCCACCCTC 1350 

M I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 
Db 1 CAGTTTACGACGCTGAGCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTC 60 



Qy 1351 CTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGC .1410 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 CTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGC 120 

Qy 1411 CATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCT 147 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
Db 121 CATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCT 180 



Qy 



1471 CTCATCCCTTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCAATG 1530 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 

Db 181 CTCATCCCTTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCAATG 24 0 

Qy 1531 CTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATC 1590 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
Db 241 CTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATC 300 

Qy 1591 CTCGGGGAGCTTCCGGAGCACTGTGCCTACATCA 1624 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 301 CTCGGGGAGCTTCCGGAGCACTGTGCCTACATCA 334 



RESULT 9 
BB610072 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



BB610072 510 bp mRNA linear EST 26-OCT-2001 

BB610072 RIKEN full-length enriched, adult male liver Mus musculus 
cDNA clone 1300007N20 5\ mRNA sequence. 
BB610072 

BB610072.1 GI: 16451685 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 510) 

Arakawa,T., Carninci^P., Fukuda,S., Furuno,M., Hanagaki,T., 
Kara, A., Hiramoto,K., Hori,F., Ishii,Y., Ito,M., Kawai^J., 
Konno,H., Kouda,M., Koya,S., Matsuyama^ T . , Miyazaki,A., Nomura, K., 
Ohno,M., Okazaki,Y., Okido,T., Saito,R., Sakai,C-, Sakai,K., 
Sano,H., Sasaki, D,, Shibata,K., Shinagawa,A. , Shiraki,T., 
Sogabe,Y., Suzuki, H,, Tagami,M., Tagawa,A., Takahashi, F. , 
Takeda,Y., Tanaka,T., Toya,T., Muramatsu,M. and Hayashizaki,Y. 
RIKEN Mouse ESTs (Arakawa,T., et al . 2001) 
Unpublished (2001) 
Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email: genome-res@gsc . riken . go . jp, 

URL :http: //genome. gsc. riken, go. jp/ 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 
Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., 
Matsuura,S., Kawai,J., Okazaki,Y., Muramatsu, M. , Inoue,Y., Kira,A. 
and Hayashizaki, Y. 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. . 
10 (11), 1757-1771 (2000) 

Konno,H., Fukunishi, Y . , Shibata,K., Itoh,M., Carninci,P., 
Sugahara,Y. and Hayashizaki, Y . 



FEATURES 

source 



Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 

Kondo,S., Shinagawa, A. , Saito,T., Kiyosawa^H., Yamanaka,!., 
Aizawa^K., Fukuda^S., Hara,A. , Itoh,M., Kawai^^J.^ Shibata,K. and 
Hayashizaki, Y. 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences. Mamm. Genome. 12, 673-677 (2001) 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details, 
e mouse tissues. 

Location/Qualifiers 
1. .510 

/organism-"Mus mus cuius" 
/mol_type="mRNA" 
/strain="C57BL/6J" 
/db_xref-"taxon: 10090" 
/clone="1300007N20" 
/sex="male" 
/tissue_type=" liver" 
/dev_stage="adult" 

/clone_lib^"RIKEN full-length enriched, adult male liver' 



ORIGIN 



Query Match 12.4%; 
Best Local Similarity 83.7%; 
Matches 375; Conservative 



Score 331.2; DB 10; 
Pred. No, 1.3e-65; 
0; Mismatches 73; 



Length 510; 
Indels 0; 



Gaps 



0; 



Qy 

Db 



99 CATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATAC 158 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

63 CATGGCTGAGAAAACCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTTCAGGATGC 122 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



159 CTCGGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGAC7\ACAGCCTGTACTTCACCTA 218 

I I I I I I I I I I I I I I II I I I II II M II II I I I I I I I I II II I I I II I II I I I II I 

123 TTCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACCTA 182 



219 



278 



CAGTGGCCAGCCC7\ACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTC 

I I I I I I III I I I I II I I I I I M I I I I I I I I I I I I I I I I I I I I I M M I I I I I I 

183 CAGTGGTCAGTCCAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACATCGCCTC 242 



279 TCAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTG 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M I I I I I I I III I 

243 TCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAG 



338 



302 



398 



339 CCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGAT 
III I II I I I I I I I I I I I I M I I II I I I I I I I I I I I I I I I I I I I I I I Mill 
303 CCAAGACTCCTGTGAGCTGGGCATCCGAAATCTAAGCTTCAAAGTGAGGAGTGGACAGAT 362 

399 GCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGG '458 

I II I I I I I I I I I M I I II I I I II I II M I I I I I I I I I I II II I I I I I I I I II 
363 GCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACAGG 422 

459 CCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCT^ATGGGCAGCCCAGCTC 518 

I I I I I Mill I I I M I I I II II I I I II II I I I I I I I M I I I I I I I II I 
423 CAGAGGCCACGGTGGCAAGATGAAATCAGGACAAATTTGGATAAATGGGCAACCCAGTAC 482 



Qy 



519 GCCTCAGCTGGTGAGGAAGTGTGTGGCC 546 



Db 



I I I I I I I I I I I I I I I I I I I I I II III 

483 GCCTCAGCTGGTGAGGAAGTGCGTTGCC 510 



RESULT 10 

AI157365 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



AI157365 511 bp mRNA linear EST 30-SEP-1998 

ui45h01.yl Sugano mouse embryo mewa Mus musculus cDNA clone 
IMAGE: 1885393 5*, mRNA sequence. 
AI157365 

AI157365.1 GI:3685834 
EST- 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 511) 

Marra,M., Hillier,L.^ Allen, M., Bowles, M., Dietrich, N., Dubuque, T., 
Geisel,S., Kucaba,T., Lacy,M., Le,M., Martin, J., Morris, M., 
Schellenberg, K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R, and 
Waterston, R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseesti^watson . wustl . edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 969717 

Seq primer: custom primer used 
High quality sequence stop: 480. 

Location/Qualif iers 

1. .511 

/organism="Mus musculus" 
/mol_type="mRNA" 
/strain="C57BL" 
/db__xref="taxon: 10090" 
/clone="IMAGE: 1885393" 
/dev_stage=" embryo, 14 dpc" 
/lab_host="DH10B" 

/clone_lib="Sugano mouse embryo mewa" 

/note='"Vector: pMEl8S-FL3; Site_l : Dralll (CACTGTGTG) ; 
Site_2: Dralll (CACCATGTG) ; 1st strand cDNA was primed 
with an oligo(dT) primer [ATGTGGCCTTTTTTTTTTTTTTTTT] ; 
double-stranded cDNA was ligated to a Dralll adaptor 

[TGTTGGCCTACTGG] , digested and cloned into distinct Dralll 
sites of the pME18S-FL3 vector (5' site CACTGTGTG, 3' site 
CACCATGTG) . Xhol should be used to isolate the cDNA 
insert. Size selection was performed to exclude fragments 
<1.5kb. Library constructed by Dr. Sumio Sugano 

(University of Tokyo Institute of Medical Science) . 
Custom primers for sequencing: 5' end primer 



ORIGIN 



CTTCTGCTCTAAAAGCTGCG and 3* end primer 
CGACCTGCAGCTCGAGCACA. " 



Query Match 12.1%; Score 323; DB 9; Length 511; 

Best Local Similarity 83.0%; Pred. No. le-63; 

Matches 381; Conservative 0; Mismatches 75; Indels 3; Gaps 1; 

Qy 99 CATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATAC 158 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 

Db 53 CATGGCTGAGAAAACCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTTCAGGATGC 112 

Qy 159 CTC GGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCAC 215 

II II I I II II II II II I I I I I I I I I II I I I I I I II I I II I I I I I I I I I I I I I 

Db 113 TTCGCAGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCAC 172 

Qy 216 CTACAGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGC 275 

I I I I II I I I III I I I I I II I II I II I I I I II II I II I I I I II I II I I I II I II 
Db 173 CTACAGTGGTCAGTCCAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACATCGC 232 

Qy 27 6 CTCTCAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAG 335 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I II I I I I I II 
Db 233 CTCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAG 292 

Qy 336 CTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCA 395 

I I I I I I II I I I I I I I I I I I I I I I I II I I I I M I I I I I I I I I I I I II I I II 
Db 293 CAGCCAAGACTCCTGTGAGCTGGGCATCCGAAATCTAAGCTTCAAAGTGAGGAGTGGACA 352 

Qy 396 GATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCAC 455 

I I II I I I I II I I I I II I I I I I I I I I I I II I I I I I I I I I I I I II II I I I I I I I I 
Db 353 GATGCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCAC 412 

Qy 456 TGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAG 515 

III I I I I I I I I I I I II I I I I II I I I I I II II I I I I I I I I I I I II I I I II 

Db 413 AGGCAGAGGCCACGGTGGCAAGATGAAATCAGGACAT^ATTTGGATAAATGGGCAACCCAG 472 

Qy 516 CTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCG 554 

M I I I M I I I I I I I I I I I I I I I II II II I II II 

Db 473 TACGCCTCAGCTGGTGAGG7\AGTGCGTTGCGCATGTGCG 511 



RESULT 11 

T84531 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



T84531 564 bp mRNA linear EST 17-MAR-1995 

yd53b02.rl Soares fetal liver spleen INFLS Homo sapiens cDNA clone 
IMAGE: 111915 5*, mRNA sequence. 
T84531 

T84531.1 GI:712883 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 564) 

Hillier,L., Clark, N., Dubuque, T., Elliston,K., Hawkins, M., 
Holman,M., Hultman,M., Kucaba,T., Le,M., Lennon,G., Marra,M., 
Parsons, J., Rifkin,L., Rohlfing,T., Soares, M., Tan,F., 



TITLE 
J0URN7\L 
COMMENT 



FEATURES 

source 



Trevaskis, E, , Waterston, R. , Williamson, A. , Wohldmann, P . and 
Wilson, R. 

The WashU-Merck EST Project 
Unpublished (1995) 
Contact: Wilson RK 

Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: est@watson.wustl.edu 

Insert Size: 827 

High quality sequence stops: 383 Source: IMAGE Consortium, LLNL 

This clone is available royalty-free through LLNL ; contact the 

IMAGE Consortium (info@image.llnl.gov) for further information. 

Insert Length: 827 Std Error: 0.00 

Seq primer: M13RP1 

High quality sequence stop: 383. 

Location/Qualifiers 

1. .564 

/organism="Homo sapiens" 
/mol_type-"mRNA" 
/db_xref="GDB: 467532" 
/db_xref="taxon: 9606" 
/clone="IMAGE: 111915" 
/sex- "male" 

/dev_stage="20 week-post conception fetus" 
/lab_host="DH10B (ampicillin resistant)" 
/clone_lib="Soares fetal liver spleen INFLS" 
/note="Organ: Liver and Spleen; Vector: pT7T3D (Pharmacia) 
with a modified polylinker; Site_l: Pac I; Site_2: Eco RI ; 
1st strand cDNA was primed with a Pac I - oligo(dT) primer 
[5 * AACTGGAAGAATTAATTAAAGATCTTTTTTTTTTTTTTTTTTT 3'] , 
double-stranded cDNA was ligated to Eco RI adaptors 
(Pharmacia), digested with Pac I and cloned into the Pac I 
and Eco RI sites of the modified pT7T3 vector. Library 
went through one round of normalization. Library 
constructed by Bento Soares and M.Fatima Bonaldo." 



ORIGIN 



Query Match 11.9%; 
Best Local Similarity 98.9%; 
Matches 352; Conservative 



Score 318.4; DB 14; 
Pred. No. 1.2e-62; 
0; Mismatches 1; 



Length 564; 
Indels 3; 



Gaps 



3; 



Qy 



Db 



1871 CCATUVGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAGTTCAGCAGAA 1930 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
1 CCAAAGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAGTTCAGCAGAA 60 



Qy 

Db 

Qy 

Db 



1931 GAACTTATAAAATGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGATAAAATCCTCA 1990 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M 

61 GAACTTATAAAATGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGATAAAATCCTCA 120 

1991 GTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCA 2050 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
121 GTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCA 180 



Qy 



2051 GCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGAAACCAAGTC 2110 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 



181 GCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATC7\AACAGAAACCAAGTC 240 



Qy 2111 AAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGA-GCAGACCCTTC 2169 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 241 AAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGGCAGACCCTTC 300 

Qy 2170 AACTGCAC-TCCCTCCTCAGGAGCCCCTTCCTGGGG-ACAGTGAGGACAATGACCC 2223 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 301 AACTGCACTTCCCTCCTCAGGAGCCCCTTCCTGGGGAACAGTGAGGAC7\ATGAACC 356 



RESULT 12 

AI151811 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



AI151811 500 bp mRNA linear EST 30-SEP-1998 

ui4 6cl0.yl Sugano mouse embryo mewa Mus musculus cDNA clone 
IMAGE: 1885458 5', mRNA sequence. 
AI151811 

AI151811.1 GI:3680280 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 500) 

Marra,M., Hillier,L., Allen, M., Bowles, M., Dietrich, N., Dubuque, T., 
Geisel,S., Kucaba,T., Lacy,M. , Le,M., Martin, J., Morris, M., 
Schellenberg,K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 
Waterston, R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of Medicine? 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI : 969782 

Seq primer: custom primer used 
High quality sequence stop: 4 99. 

Location/Qualif iers 

1. .500 

/organism="Mus musculus" 
/mol_type="mRNA" 
/strain="C57BL" 
/db_xref="taxon: 10090" 
/clone-"IMAGE: 1885458" 
/dev_stage=" embryo, 14 dpc" 
/lab_host="DH10B" 

/clone lib="Sugano mouse embryo mewa" 

/note="^Vector: pME18S-FL3; Site_l: Dralll (CACTGTGTG) ; 
Site_2: Dralll (CACCATGTG) ; 1st strand cDNA was primed 
with an oligo(dT) primer [ATGTGGCCTTTTTTTTTTTTTTTTT] ; 
double-stranded cDNA was ligated to a Dralll adaptor 



[TGTTGGCCTACTGG] , digested and cloned into distinct Dralll 
sites of the pME18S-FL3 vector (5' site CACTGTGTG, 3* site 
CACCATGTG) . Xhol should be used to isolate the cDNA 
insert. Size selection was performed to exclude fragments 
<1.5kb. Library constructed by Dr. Sumio Sugano 
(University of Tokyo Institute of Medical Science). 
Custom primers for sequencing: 5* end primer 
CTTCTGCTCTAAAAGCTGCG and 3* end primer 
CGACCTGCAGCTCGAGCACA. " 

ORIGIN 

Query Match 11.6%; Score 309.8; DB 9; Length 500; 

Best Local Similarity 83.0%; Pred. No, 1.2e-60; 

Matches 366; Conservative 0; Mismatches 72; Indels 3; Gaps 1; 

Qy 99 CATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATAC 158 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 

Db 60 CATGGCTGAGAAAACCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTTCAGGATGC 119 

Qy 159 CTC GGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCAC 215 

II llllllllllll.il I I I I II I I II I II I I II II II I I I I M II I II II I I 

Db 120 TTCGCAGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCAC 179 

Qy 216 CTACAGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGC 275 

I I I I I I I I I III I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I II 

Db 180 CTACAGTGGTCAGTCCAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACATCGC 239 

Qy 276 CTCTCAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAG 335 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II 

Db 240 CTCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTC7\AGATACCCTGGAGGTCTCATAG 2 99 

Qy 336 CTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCA 395 

I I I I I I II I I I I I I I I I I I II I I I II I I I I M I I II I I I I I I I I I I I I II 
Db 300 CAGCCAAGACTCCTGTGAGCTGGGCATCCGAAATCTAAGCTTCAAAGTGAGGAGTGGACA 359 

Qy 396 GATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCAC 455 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II II I I I I I I I I 

Db 360 GATGCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCAC 419 

Qy 4 56 TGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAG 515 

III I I I I I I I I I I I I I I I I I II I I I I I II II I I I II I I I I I I I I I I I I I 

Db 420 AGGCAGAGGCCACGGTGGCAAGATGAAATCAGGACAAATTTGGATAAATGGGCAACCCAG 479 

Qy 516 CTCGCCTCAGCTGGTGAGGAA 536 

I I I I I I I I M I I I I I III 
Db 480 TACGCCTCAGCTGGTGAAGAA 500 



RESULT 13 

AA537862 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 



AA537862 463 bp mRNA linear EST 29-JUL-1997 

vj35a03.rl Stratagene mouse diaphragm (#937303) Mus musculus cDNA 
clone IMAGE: 930988 5', mRNA sequence. 
7\A537862 

AA537862. 1 GI: 2283855 
EST. 

Mus musculus (house mouse) 



ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



Mus mus cuius 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 
1 (bases 1 to 463) 

Marra.M., Hillier.L., Allen, M., Bowles, M., Dietrich, N., Dubuque, T., 
Geisel,S., Kucaba,T., Lacy,M., Le,M,, Martin, J., Morris, M., 
Schellenberg,K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 
Waterston, R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 535908 

Seq primer: -28ml3 revl ET from Amersham 
High quality sequence stop: 393. 

Location/Qualifiers 

1. .463 

/organism="Mus musculus" 
/mol_type="mRNA" 
/db_xref="taxon: 10090" 
/clone="IMAGE: 930988" 
/tissue_type= "diaphragm" 
/dev_stage="adult" 

/lab_host="SOLR (kanamycin resistant)" 
/clone__lib="Stratagene mouse diaphragm (#937303)" 
/note="Organ: diaphragm; Vector: pBluescript SK-; Site_l: 
EcoRI; Site_2: Xhol; Cloned unidirectionally from mRNA 
prepared from diaphragm muscle. Primer: Oligo dT. Average 
insert size: 1.5 kb. Uni-ZAP XR Vector; -5* adaptor 
sequence: 5* GAATTCGGCACGAG 3* ~3 ' adaptor sequence: 5* 
CTCGAGTTTTTTTTTTTTTTTTTT 3 * " 



ORIGIN 



Query Match 10.4%; Score 276.8; DB 9; Length 463; 

Best Local Similarity 76.3%; Pred. No. 5e-53; 

Matches 354; Conservative 0; Mismatches 107; Indels 3; Gaps 



1; 



Qy 1117 GAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGA7\AAAGTGCGTGACTTAGATGACTTT 1176 

I I I I I M I I I I I I II I II I II I II II I I I I II II II I I I III I I I I M M I 

Db 1 GAGAAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTT 60 

Qy 1177 CTATGGAAAGCAGAGACGT^GGATCTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACC 1236 

I I I I I I I I I I I I I I I I I I I I I II I II I I Ml I 

Db 61 CTGTGGAAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAGTCAGCCTGACCCTC 120 



Qy 1237 CCACTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTT 1296 

Ml Mill MM I I I I I M M II I II II I M II II I 

Db 121 ACACAGGACACTGACTG TGGGACTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTTT 177 



Qy 1297 ACGACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCATC 1356 

I II I I I I I II M I I I II I I II I I II I II I I I I M I I I M II M II I I II I I I 
Db 178 TCCACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATT 237 

Qy 1357 CATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGG 1416 

I I I I II I I I I Mill I I I I II I I I I Ml M I M I M II I II M II I M 

Db 238 CATGGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGG 297 

Qy 1417 AGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTGATC 1476 

I I M M II I I II I II M Mill M M M I I I I I I I I I II II II I I I 

Db 298 GCAGAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATT 357 

Qy 1477 CCTTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTAC 1536 

II I I M I I I II I I I II I I I I M M I I M M I I I MM I I II I I I II I I I III 

Db 358 CCTTTCAATGTCATCCTGGATGTCGTCTCCAAATGTCACTCGGAGAGCTCAATGCTGTAC 417 

Qy 1537 TATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTT 1580 

II I M I II II I I M II I M II I II I I I I I II II I II I 
Db 418 TATGAGCTGGAAGACGGGCTGTACACTGCCAATACATATTTCTT 4 61 



RESULT 14 

CB502603/C 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
COMMENT 



FEATURES 

source 



CB502603 781 bp itiRNA linear EST 16-MAY-2003 

ssalmge503002 gut Salmo salar cDNA, mRNA sequence. 

CB502603 

CB502603. 1 GI:29313829 
EST. 

Salmo salar (Atlantic salmon) 
Salmo salar 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Actinopterygii ; Neopterygii ; Teleos tei ; Euteleos tei ; 
Protacanthopterygii; Salmonif ormes ; Salmonidae; Salmo. 
1 (bases 1 to 781) 

GRASP Consortium, Davidson, W . S . , Koop,B.F. and 
http : / /web . uvic. ca/ cbr/ grasp . 

A survey of Salmo salar transcripts from high complexity cDNA 
libraries 

Unpublished (2002) 

Contact: Koop BF 

Centre for Biomedical Research 

University of Victoria 

PO Box 302 0 STN CSC, Victoria BC, V8W 3N5, Canada 
Tel: 250 472 4067 
Fax: 250 472 4075 
Email: bkdop@uvic . ca 

Genome Sciences Centre, BC Cancer Agency cDNA preparation, 
sequencing and bioinf ormatics : Y Butterfield, R Kirkpatrick, J 
Asano, N Girn, R Guin, D Lee, S Lee, T Olson, P Pandoh, A Prahbu, D 
Smailus, L Spence, J Stott, S Taylor, G Yang, 
M Marra. 
POLYA=Yes . 

Location/Qualifiers 

1. .781 

/organism=" Salmo salar" 
/mol_type="mRNA" 
/strain="McConnell" 



J Schein, S Jones and 



/db_xref="taxon:8030" 
/clone_lib="gut" 

/note="Vector : pBlueScriptIISK+; Library Creator: Matthew 
L Rise ; Atlantic salmon tissue contributors: Carlo Biagi, 
Mitch Uh and Robert Devlin (DFO, Vancouver, B.C.), Simon 
Jones (PBS, Nanaimo, B.C.), Seaspring Hatchery (Crofton, 
B.C.), Rachel Roper (University of Victoria)" 



ORIGIN 



Query Match 10.3%; 
Best Local Similarity 65.7%; 
Matches 4 02; Conservative 



Score 276; DB 14; 
Pred. No. le-52; 
0; Mismatches 210; 



Length 781; 



Indels 



0; Gaps 



0; 



Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 



1510 TGTTACTCAGAGAGGGC7\ATGCTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGT 1569 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

769 TGTCACACAGAGAGAGCTATGTTGTACCATGAGCTGGAGGACGGCATGTATAACGTCACA 710 

157 0 CCATATTTCTTTGCCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTACATCATCATC 1629 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I II 

709 TCCTACTTCTTTGCCAAGGTCCTGGGGGAGCTTCCAGAGCACTGTGTGTTCACGTTGGTC 650 

1630 TACGGGATGCCCACCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTG 1689 

I I I I I I II II I I I I II I II I I I I I I I I I III I I I I I I I I I I I 

64 9 TACGGCCTACCCATCTACTGGCTGGCTGGCCTGAACCAGGCCCCGGACCGCTTCCTGCTC 590 

1690 CACTTCCTGCTGGTGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCG 1749 

II I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II 

589 AACTTCCTGCTGGTGTGGCTCATGGTGTACTGCAGCCGCAGCATGGCTCTGTTTGTGGCT 530 

1750 GCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTCCTTC 1809 

II I I II I I I I I I I I I I I I I I I II I I I II I I I III 

529 GCAGCCTTACCCACCCTGCAGACATCAGCCTTCATGGGCAATTCTCTGTTCACTGTGTTC 470 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1810 TACCTCGCCGGGGGCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATT 1869 

I II II I II I I I I I I I II I I I I I I I II I I I I I I I I I I I 
469 TACCTTACTGGAGGCTTCGTCATTAGCCTGGAGAACATGTGGTTCGTGGCGTCCTGGTTC 410 

1870 TCCAAAGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAGTTCAGCAGA 1929 

I I I I I I I I I I I II I I I II I II I I I M I I I I II I I I I I I I II M 

TCCCATGCCTCCTTCATGCGCTGGGGCTTTGAGGGCATGCTGCAGGTCCAGTTCAGGGGA 



409 



350 



1930 AGAACTTATAAAATGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGATAAAATCCTC 1989 

I II I I I I II I I I I I I I I I I II II I I I 

349 CTCT^GTACCCCGTCTCCATCGGCAACTTCACCCTCAACATCGATGGCATACATGTGGTG 2 90 

1990 AGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTC 2049 

I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I II 
289 GAAGCTATGGATATGAACCAGTACCCTCTCTACTCCTGCTACCTGGTTCTCATCGCTGTC 230 

2050 AGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAG7W^CC7\AGT 2109 

I II I I I I I M I II I I I I II I I I I I I I I I I II I I I I I I III 
229 GTAGTGGGCTTCATGCTGCTCTACTACCTATCACTCAAATTCATCAAGCAGAAGTCCAGC 170 



Qy 2110 CAAGACTGGTGA 2121 

II I I I I I I I I I 
Db 169 CAGGACTGGTGA 158 



RESULT 15 

CD739823 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
COMMENT 



FEATURES 

source 



CD739823 640 bp mRNA linear EST 26-JUN-2003 

4028769 IGAL - Chicken Intestinal Lymphocyte Gallus gallus cDNA 
clone 1GAL_21P20 5', mRNA sequence. 
CD739823 

CD739823. 1 GI: 322 90672 
EST, 

Gallus gallus (chicken) 
Gallus gallus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Archosauria; Aves; Neognathae; Galliformes; Phasianidae; 
Phasianinae; Gallus. 
1 (bases 1 to 640) 

Min,W., Lillehoj , H. S . , Ashwell , CM. , Matukumalli, L. K. , van 
Tassel, C. and Han,J.Y. 

Chicken intestinal lymphocyte EST database as a resource for the 

analysis of mucosal immune function 

Unpublished (2003) 

Contact: Hyun S. Lillehoj 

Animal Parasite Diseases Laboratory 

Animal and Natural Resources Institute, USDA 

Bldg.1043, BARC-East, Beltsville, MD 20705, USA 

Tel: 3015048771 

Fax: 3015045103 

Email ; hlilleho(3anri . bare . usda . gov 

Single pass sequencing. Bases called and trimmed with phred 
0.000925 using options -trim__alt -trim_fasta. Vector identified 
by cross_match using options -minmatch 12 -minscore 18 
Plate: 21 row: P column: 20 
Seq primer: ATTTAGGTGACACTATAG 
High quality sequence stop: 640. 

Location/Qualifiers 

1. .640 

/organism="Gallus gallus" 

/mol_type="mRNA" 

/strain="white leghorn SC" 

/db__xref-"taxon: 9031" 

/ clone=" 1GAL_2 1P2 0 " 

/sex="mixed" 

/tissue_type^"Gut" 

/cell_type=" Lymphocyte" 

/dev_stage="Adult" 

/lab__host-"EMDH10B" 

/clone_lib-"lGAL - Chicken Intestinal Lymphocyte" 
/note="Organ: Intestine; Vector: pCMV-SP0RT6; Site_l: 
Sail; Site_2: NotI; Normalized library from chicken gut 
infected with coccidia duodenum and middle gut. " 



ORIGIN 



Query Match 9.9%; Score 263.2; DB 14; Length 640; 

Best Local Similarity 64.7%; Pred. No. 8.5e-50; 

Matches 407; Conservative 0; Mismatches 219; Indels 3; Gaps 1; 

Qy 1351 CTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGC 1410 

II M I II II I I I I II I I I I I I I I I I I I I I I I I I I II I I 



Db 



6 CTAGTCCATGGATTTGAGGCCCTTGTCATGTCATTATTAATTGGATTTTTGTACTATGGC 65 



Qy 1411 CATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCT 147 0 

III III I I II II I I I I M I I I II I I I I I I I II I I I I I 

Db 66 CACGAAGGCA GACTCTCCATTCGTGACACATCAGCACTGCTGTACATGATAGGTGCA 122 

Qy 1471 CTCATCCCTTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCAATG 153 0 

I I I M I I I I I I I I I I I I I I I I I I I I 11111111111111111111 
Db 123 CTAATCCCATTCACGGTGATTTTGGATGTTATTGTCTAATGTCATTCAGAAAGAGCAATG 182 

Qy 1531 CTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATC 1590 

I I II I III I I I I I I II I I I I I I I I I I I I I I I I I I I I I II 
Db 183 CTTTATCTTGACTTGGAAAATGGAATGTATTCTGTTACCCCGTACTTCTTTGCTAAGATT 242 

Qy 1591 CTCGGGGAGCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGG 1650 

I I I I I I I I I I I I I I I I I I I I II I I I II II II III I I I I I I M I I I I 

Db 243 TTGGGGGAGCTTCCCGAGCACTGCGCTTTCGTTATAATTTATGGGGTTCCCATCTACTGG 302 

Qy 1651 CTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTG 1710 

I I I I II I I Ml II I I II I I I I I I M I I I I I I I I I II 

Db 303 CTGACAAATCTATTTCCTGAAGCAGAACATTTTCTGCTGAACTTCTTCTCAGTGTGGCTG 362 

Qy 1711 GTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCAC 1770 

I I I I I M II I I I I I I I I I I II I I I I I M I I M I I 

Db 363 GCTGTATACTGCGCCCGTGCAATGGCACTTTGGGTGGCAGCACTGCTGCCAACGTTACAG 422 

Qy 1771 ATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATG 1830 

I I I I I I I I I I I I I I I I I II I I I M M I I I I I I II I I II 
Db 423 CTCTCAGCTTTCCTTGGCAATGTCCTTTTCACTTCGTTCTACCTGAGCGGTGGTTTTGTG 482 

Qy 1831 ATAAACTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCTVAAGTGTCCTTCCTGCGG 1890 

I I I I I I I I I I I I M I M II I I I I I II I I I I I I I I I I I 

Db 483 ATAAGCCTGGAACAACTCTGGACAGTTCCATATTGGGTTTCTAAGGTATCTTTTCTCAGA 542 

Qy 1891 TGGTGTTTTGAAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTC 1950 

III III I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 

Db 543 TGGAATTTCCAAGGCATGATGCANATTCAGTTCACTGATTCCATATATGATATGCCTTTT 602 

Qy 1951 GGGAACCTCACCATCGCGGTCTCAGGAGA 1979 

I I I I I I I I M I I I I M M I 

Db 603 GGGAACGTCACAATTAAAATTCCAGGAAA 631 
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ALIGNMENTS 



RESULT 1 
AX685735 

LOCUS AX685735 2669 bp DNA linear PAT 29-MAR-2003 

DEFINITION Sequence 7 from Patent WO02081691. 
ACCESSION AX685735 

VERSION AX685735.1 GI:29371744 

KEYWORDS 

SOURCE Homo sapiens (human) 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 

AUTHORS Hobbs,H.H., Shan^B,, Barnes, R. and Tian,H. 
TITLE Abcg5 and abcgS : compositions and methods of use 

JOURNAL Patent: WO 02081691-A 7 17-OCT-2002; 

Tularik Inc. (US) ; BOARD OF REGENTS UNIVERSITY OF TEXAS SYSTEM 

(US) 

FEATURES Location/Qualifiers 
source 1 . . 2669 

/organism="Homo sapiens" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 9606" 
CDS 100. .2121 

/note="unnamed protein product; human ABCG8 (hABCG8)" 

/ codon_start=l 

/protein_id-"C7VD86573.1" 

/db_xref="GI: 29371745" 

/ db_x r e f = " REMTREMBL : CAD 8 6 5 7 3 " 

/translation=="MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQP 
NTLEVRDLNYQVDLASQVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLA 
IIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPN 
LTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGER 
RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDI 
FRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQ 
ELATREKAQSLAALFLEKVRDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPTKM 
PGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAA 
LLFMIGALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYI 
IIYGMPTYWLANLRPGLQPFLLHFLLVWLWFCCRIMALAAAALLPTFHMASFFSNAL 
YNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAV 
SGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYVSLRFIKQKPSQDW" 



ORIGIN 



Query Match 100.0%; Score 2669; DB 6; Length 2669; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2 669; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 GTGTCCCTGCTCCAGGAAACAGAGTGAAGACACTGGCCCTGGCAGGCAGCAGCTGGGTCT 60 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 GTGTCCCTGCTCCAGGAAACAGAGTGAAGACACTGGCCCTGGCAGGCAGCAGCTGGGTCT 60 

Qy 61 AAGAGAGCTGCAGCCCAGGGTCACAGACCTGTGGGCCCCATGGCCGGGAAGGCGGCAGAG 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 61 AAGAGAGCTGCAGCCCAGGGTCACAGACCTGTGGGCCCCATGGCCGGGAAGGCGGCAGAG 120 

Qy 121 GAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATACCTCGGGCCTCCAGGATAGATTG 180 

I I I.I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 GAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATACCTCGGGCCTCCAGGATAGATTG 180 

Qy 181 TTCTCCTCTGA7UVGTGACAACAGCCTGTACTTCACCTACAGTGGCCAGCCCAACACCCTG 240 

I I I I I I I I I I II I II I I I I I M I I II I I II I I I I I II II I I I II I I II I II II I I M I I I 
Db 181 TTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTACAGTGGCCAGCCCAACACCCTG 240 

Qy 241 GAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTCTCAGGTCCCTTGGTTTGAGCAG 300 

I II I II I I II I I I I I I I II II I I I I I I I I II I I I I I II II II I I II I II I I II I I I I II I 
Db 241 GAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTCTCAGGTCCCTTGGTTTGAGCAG 300 

Qy 301 CTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTGCCAGAATTCTTGTGAGCTGGGC 360 

I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I 
Db 301 CTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTGCCAGAATTCTTGTGAGCTGGGC 360 

Qy 361 ATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAGCTCA 420 

I I II I I I I I I I II II II I I I I I II I I I II I I I I I I I I I I II M I I I II I II I II I II I I I 
Db 361 ATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAGCTCA 420 

Qy 421 GGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAGGTCACGGCGGCAAGATC 480 

I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I II I I I I I I I II I I I I I I I II I I I I I I I 

Db 421 GGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAGGTCACGGCGGCAAGATC 4 80 

Qy 481 AAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGTGAGGAAGTGT 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 481 AAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGTGAGGAAGTGT 540 

Qy 541 GTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTG 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 541 GTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTG 600 

Qy 601 GCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGG 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 GCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGG 660 

Qy 661 GTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAAC 720 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 661 GTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAAC 720 

Qy 721 ATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTC 7 80 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 721 ATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTC 780 



Qy 781 CTGTGG7\AGCCAGG7\ATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACA 84 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I I I I I 
Db 781 CTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACA 840 

Qy 841 GCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATC 900 

I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
Db 841 GCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATC 900 

Qy 901 TCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACG 960 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 901 TCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACG 960 

Qy 961 TCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATC 102 0 

I I M M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I M I I I I 11 I I I I 

Db 961 TCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATC 102 0 

Qy 1021 GGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATT 1080 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1021 GGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATT 1080 

Qy 1081 GACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCC 1140 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I M I M I I I I I I I 
Db 1081 GACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAG7VAGGCTCAGTCACTCGCAGCC 114 0 

Qy 1141 CTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGGAT 1200 

I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1141 CTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGGAT 1200 

Qy 12 01 CTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCCACTAGACACCAACTGCCTCCCG 12 60 

I I I I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I I M I I M I I I I I I I I I I I I M I I I I 

Db 1201 CTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCCACTAGACACCAACTGCCTCCCG 1260 

Qy 1261 AGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATT 1320 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1261 AGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATT 1320 

Qy 1321 TCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATG 1380 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
Db 1321 TCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATG 1380 

Qy 1381 TCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGAT 1440 

I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1381 TCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGAT 1440 

Qy 1441 ACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATGTC 1500 

I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1441 ACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATGTC 1500 

Qy 1501 ATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTAC 1560 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 1501 ATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTAC 1560 

Qy 1561 ACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTAC 1620 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1561 ACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTAC 1620 



Qy 1621 ATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCC 1680 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1621 ATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCC 1680 

Qy 1681 TTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTG 174 0 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 
Db 1681 TTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTG 1740 

Qy 1741 GCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTAC 1800 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1741 GCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTAC 18 00 

Qy 1801 AACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCC 1860 

I I I I I I M I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I M I I I I I I 
Db 18 01 AACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCC 1860 

Qy 1861 GCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGT^GATTCAG 1920 

M I M M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1861 GCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAG 1920 

Qy 1921 TTCAGCAGAAGAACTTATAAAATGCCTCTCGGGTyVCCTCACCATCGCGGTCTCAGGAGAT 1980 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1921 TTCAGCAGAAG7\ACTTAT7U\7^TGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGAT 1980 

Qy 1981 AAAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTC 204 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1981 AAAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTC 2040 

Qy 2041 ATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAG 2100 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M M I M I I I M I I I I I I I I 

Db 2041 ATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAG 2100 

Qy 2101 AAACCAAGTCAAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGC 2160 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
Db 2101 AAACCAAGTCAAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGC 2160 

Qy 2161 AGACCCTTCAACTGCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGA 2220 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2161 AGACCCTTCAACTGCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGA 2220 

Qy 2221 CCCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAG 2280 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2221 CCCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAG 2280 

Qy 2281 GATGGCAGTAGAATAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGAT 2340 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2281 GATGGCAGTAGAATTWVGACAGTCGTWVGGGATTTCTGCTCACTGGCAGGAGACTGCGAT 2340 

Qy 2341 GACTGGGAGAAAACCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGA 2400 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I I 
Db 2341 GACTGGGAGAT^CCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGA 2400 

Qy 24 01 TATGCATTTATATAGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAG 24 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 

Db 2401 TATGCATTTATATAGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAG 24 60 

Qy 2461 CTAGACTGTGCAGGAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTG 2520 



I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2461 CTAGACTGTGCAGGT^TTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTG 2520 

Qy 2521 GCTTCATCTTCCAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACC 2580 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I I I I I I I 
Db 2521 GCTTCATCTTCCAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACC 2580 

Qy 2581 TAAGATGTACCAGCAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGC 2640 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I I I I I I I 
Db 2581 TAAGATGTACCAGCAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGC 2 640 

Qy 2641 CAACGTGAACAATTAAAAATGTATTGAGC 2669 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2641 CAACGTGAACAATTAAAAATGTATTGAGC 2669 
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1. .2679 

/organism="Homo sapiens" 
/mol_type-"mRNA" 
/db_xref="taxon: 9606" 
/ ch r omo s ome= " 2 " 

/map="2p21; between D2S2294 and D2S2298" 

/ tissue__type="liver" 

1. .2679 

/gene="ABCG8" 

91. .2112 

/gene="ABCG8" 

/ codon start=l 



/product="sterolin-2" 
/protein_id="7iiAK84 07 8. 1" 
/db_xref="GI: 15088540" 

/translation="MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQP 

NTLEVRDLNCQVDLASQVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLA 

IIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPN 

LTVRETLAFIAQMRLPRTFSQAQRDKRVEDVI7VELRLRQCADTRVGNMYVRGLSGGER 

RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDI 

FRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSNP7VDFYVDLTSIDRRSREQ 

EIJVTREKAQSIJ\ALFLEKVRDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPTKM 

PGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAA 

LLFMI GALI P FNVI LDVI S KC YS ERAML YYELEDGL YTTGP YFFAKI LGELPEHCAYI 

IIYGMPTYWIJ^NLRPGLQPFLLHFLLWLWFCCRIiyiAIAAAALLPTFHM^ 

YNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAV 

SGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYVSLRFIKQKPSQDW" 

ORIGIN 



Query Match 99.5%; Score 2655.2; DB 9; Length 2679; 

Best Local Similarity 99.9%; Pred. No. 0; 

Matches 2657; Conservative 0; Mismatches 3; Indels 0; Gaps 0; 



Qy 10 CTCCAGGAAACAGAGTGAAGACACTGGCCCTGGCAGGCAGCAGCTGGGTCTAAGAGAGCT 69 

I I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 

Db 1 CTCCAGGAAACAGAGTGAAGACACTGGCCCTGGCAGGCAGCAGCTGGGTCTAAGAGAGCT 60 



Qy 7 0 GCAGCCCAGGGTCACAGACCTGTGGGCCCCATGGCCGGGAAGGCGGCAGAGGAGAGAGGG 129 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 GCAGCCCAGGGTCACAGACCTGTGGGCCCCATGGCCGGGAAGGCGGCAGAGGAGAGAGGG 120 



Qy 130 CTGCCG7WVGGGGCCACTCCCCAGGATACCTCGGGCCTCCAGGATAGATTGTTCTCCTCT 189 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 CTGCCGAAAGGGGCCACTCCCCAGGATACCTCGGGCCTCCAGGATAGATTGTTCTCCTCT 180 



Qy 190 GATyVGTGACAACAGCCTGTACTTCACCTACAGTGGCCAGCCCAACACCCTGGAGGTCAGA 249 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 GA7\AGTGACAACAGCCTGTACTTCACCTACAGTGGCCAGCCCAACACCCTGGAGGTCAGA 240 



Qy 250 GACCTCAACTACCAGGTGGACCTGGCCTCTCAGGTCCCTTGGTTTGAGCAGCTGGCTCAG 309 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 241 GACCTCAACTGCCAGGTGGACCTGGCCTCTCAGGTCCCTTGGTTTGAGCAGCTGGCTCAG 300 

Qy 310 TTCAAGATGCCCTGGACATCTCCCAGCTGCCAGAATTCTTGTGAGCTGGGCATCCAGAAC 369 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 301 TTCAAGATGCCCTGGACATCTCCCAGCTGCCAGAATTCTTGTGAGCTGGGCATCCAGAAC 360 



Qy 37 0 CTAAGCTTCAAAGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGG 429 

I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 CTAAGCTTC7WVGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGG 420 



Qy 430 AGAGCCTCCTTGCTAGATGTGATCACTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGC 489 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 421 AGAGCCTCCTTGCTAGATGTGATCACTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGC 4 80 

Qy 490 CAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCAC 549 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 481 CAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCAC 540 



Qy 550 GTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTGGCCTTCATT 609 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 541 GTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTGGCCTTCATT 600 

Qy 610 GCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGAC 669 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 601 GCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGAC 660 

Qy 670 GTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAACATGTACGTG 729 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 661 GTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAACATGTACGTG 720 

Qy 730 CGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAAC 789 

I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 

Db 721 CGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGTiAC 780 

Qy 790 CCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACAGCCCACAAC 84 9 

I I I I I I I M I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 781 CCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACAGCCCACAAC 84 0 

Qy 850 CTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCAC 909 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
Db 841 CTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCAC 900 

Qy 910 CAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACGTCTGGCACC 969 

I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 901 CAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACGTCTGGCACC 960 

Qy 970 CCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATCGGCTACCCC 1029 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 961 CCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATCGGCTACCCC 1020 

Qy 1030 TGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATTGACAGGCGC 1089 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1021 TGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATTGACAGGCGC 1080 

Qy 1090 AGCAGAGAGCAGGT^TTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTA 1149 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1081 AGCAGAGAGCAGGAATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTA 1140 

Qy 1150 G?JW\AGTGCGTGACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGGATCTTGACGAG 12 09 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1141 GAAAAAGTGCGTGACTTAGATGACTTTCTATGGAAAGCAGAGACGT^GGATCTTGACGAG 1200 

Qy 1210 GACACCTGTGTGGAAAGCAGCGTGACCCCACTAGACACCAACTGCCTCCCGAGTCCTACG 1269 

I M I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1201 GACACCTGTGTGGAAAGCAGCGTGACCCCACTAGACACCAACTGCCTCCCGAGTCCTACG 12 60 

Qy 1270 AAGATGCCTGGGGCGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATTTCCAACGAC 1329 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1261 AAGATGCCTGGGGCGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATTTCCAACGAC 1320 

Qy 1330 TTCCGAGACCTGCCCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACC 1389 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 1321 TTCCGAGACCTGCCCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACC 1380 

Qy 1390 ATCGGCTTCCTCTATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCC 1449 



Db 1381 ATCGGCTTCCTCTATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCC 1440 

Qy 1450 CTCTTGTTCATGATCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATGTCATCTCCiyU^ 1509 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
Db 1441 CTCTTGTTCATGATCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATGTCATCTCCAAA 1500 

Qy 1510 TGTTACTCAGAGAGGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGT 1569 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1501 TGTTACTCAGAGAGGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGT 1560 

Qy 1570 CCATATTTCTTTGCCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTACATCATCATC 1629 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1561 CCATATTTCTTTGCCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTACATCATCATC 1620 

Qy 1630 TACGGGATGCCCACCTACTGGCTGGCC7\ACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTG 1689 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1621 TACGGGATGCCCACCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTG 168 0 

Qy 1690 CACTTCCTGCTGGTGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCG 1749 

I I I I I I I I I I I I I I M I I M I I I I I I I I M I I I I I I I I I I M I I I M I I I I I I I I I I I I I 

Db 1681 CACTTCCTGCTGGTGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCG 1740 

Qy 1750 GCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTCCTTC 1809 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I I 
Db 1741 GCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTeCTTC 1800 

Qy 1810 TACCTCGCCGGGGGCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATT 18 69 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 
Db 1801 TACCTCGCCGGGGGCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATT 18 60 

Qy 187 0 TCCAAAGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGTiAGATTCAGTTCAGCAGA 1929 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 
Db 1861 TCCAAAGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAGTTCAGCAGA 1920 

Qy 1930 AGAACTTATAAAATGCCTCTCGGG7\ACCTCACCATCGCGGTCTCAGGAGATAAAATCCTC 1989 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1921 AGAACTTATATUUITGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGATAAAATCCTC 1980 

Qy 1990 AGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTC 2049 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I M I I I I I I I I I I I I 
Db 1981 AGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTC 2040 

Qy 2050 AGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGAAACCAAGT 2109 

I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I M I I I I I I I I I I I I I I I I I I I I 
Db 2041 AGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGAAACCAAGT 2100 

Qy 2110 CAAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTC 2169 

I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I I I I 
Db 2101 CAAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTC 2160 

Qy 2170 AACTGCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGA 2229 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I 
Db 2161 AACTGCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGA 2220 

Qy 2230 TGCTCAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGT 2289 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 



2221 TGCTCAGCTACATCCGGCCCAGGGTGCTGCGGTGGCACAGACCAGCCACAGGATGGCAGT 2280 



Qy 2290 AG7\ATAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAG 2349 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 22 81 AGAATAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAG 234 0 

Qy 2350 AAAACCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTT 24 09 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2341 AAAACCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTT 2400 

Qy 2410 ATATAGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGCTAGACTGT 24 69 

I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 2401 ATATAGGCAACTCGATATAGGATGGGAGCAAACTAGG7\ATGAATTGGGTAGCTAGACTGT 2460 

Qy 2470 GCAGGAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGGCTTCATCT 2529 

I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 
Db 24 61 GCAGGAATTGTTGGAACCTGGAGGGAACAATAACAGTACCTAGCAGATTTGGCTTCATCT 2520 

Qy 2530 TCCAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTA 258 9 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 
Db 2521 TCCAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTA 2580 

Qy 2590 CCAGCAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGCCAACGTGAA 264 9 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2581 CCAGCAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGCCAACGTGAA 2 64 0 

Qy 2650 CAATTAAAAATGTATTGAGC 2669 

I I I I I I I I I I I I I I I I I I I I 
Db 2641 CAATTAAAAATGTATTGAGC 2660 
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AF320294 2022 bp mRNA linear 

Homo sapiens ABCG8 (ABCG8) mRNA, complete cds . 
AF320294 

AF320294.1 GI: 116928 01 



PRI 13~DEC-2000 



Craniata; Vertebrata; Euteleostomi ; 
Catarrhini; Hominidae; Homo. 



Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 

1 (bases 1 to 2022) 

Berge,K.E., Tian,H., Graf, G. A., Yu,L., Grishin, N. V. , Schultz,J., 
Kwiterovich, P . , Shan,B., Barnes, R. and Hobbs,H.H. 

Accumulation of Dietary Cholesterol in Sitosterolemia Caused by 
Mutations in Adjacent ABC Transporters 
Science (2001) In press 

2 (bases 1 to 2022) 

Berge,K.E., Tian,H., Graf, G. A., Yu,L., Grishin, N. V. , Schultz,J., 
Kwiterovich, P . , Shan,B., Barnes, R. and Hobbs,H.H. 
Direct Submission 

Submitted { 09-NOV-2000 ) Molecular Genetics, University of Texas, 
Southwestern Medical Center at Dallas, 5323 Harry Hines Blvd., 
Dallas, TX 75390-9046, USA 

Location/ Qualifiers 

1. .2022 



/organisni="Homo sapiens" 

/mol_type="mRNA" 

/db_xref="taxon:9606" 
gene 1. .2022 

/gene="ABCG8" 
CDS 1. .2022 

/gene="ABCG8" 

/note="ATP-binding cassette, subfamily G, member 8" 

/codon_start=l 

/product="ABCG8" 

/protein__id="AAG4 0004 .1" 

/db_xref="GI: 11692802" 

/translation="MAGKAM:ERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQP 

NTLEVRDLNYQVDLASQVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLA 

IIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPN 

LTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGER 

RRVSIGVQLLWNPGILILDEPTSGLDSFTMNLVKTLSRLAKGNRLVLISLHQPRSDI 

FRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQ 

ELATREKAQSLAALFLEKVRDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPTKM 

PGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAA 

LLFMIGALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYI 

IIYGMPTYWIJM^LRPGLQPFLLHFLLWLWFCCRIMALAAAALLPTFH^ 

YNSFYIAGGFMINLS3LWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAV 

SGDKILSVMELDSYPLYAIYLIVIGLSGGFMVLYYVSLRFIKQKPSQDW" 

ORIGIN 

Query Match 75.7%; Score 2020.4; DB 9; Length 2022; 

Best Local Similarity 100.0%; Pred, No. 0; 

Matches 2021; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 

Qy 100 ATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATACC 159 

I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 ATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGATiAGGGGCCACTCCCCAGGATACC 60 

Qy 160 TCGGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTAC 219 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I 
Db 61 TCGGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTAC 120 

Qy 220 AGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTCT 279 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
Db 121 AGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTCT 180 

Qy 280 CAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTGC 339 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 CAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTGC 240 

Qy 34 0 CAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATG 399 

I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 
Db 241 CAGAATTCTTGTGAGCTGGGCATCCAGTVACCTAAGCTTCAAAGTGAGAAGTGGGCAGATG 300 

Qy 400 CTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGC 459 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 CTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGC 360 

Qy 460 CGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCG 519 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 CGAGGTCACGGCGGCAAGATCT^GTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCG 42 0 



Qy 52 0 CCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAAC 579 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 421 CCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAAC 480 

Qy 5 80 TTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCC 639 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4 81 TTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCC 540 

Qy 640 CAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGC 699 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
Db 541 CAGGCCCAGCGTGAC7VAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGC 600 

Qy 7 00 GCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGA 759 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 GCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGA 660 

Qy 7 60 GTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACC 819 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 661 GTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACC 720 

Qy 820 TCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAA 879 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 721 TCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCC7WV 780 

Qy 880 GGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTT 939 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 781 GGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTT 840 

Qy 94 0 GATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATG 999 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I 

Db 841 GATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATG 900 

Qy 1000 GTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTC 1059 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 901 GTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTC 960 

Qy 1060 TATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAG 1119 

I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 961 TATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAG 1020 

Qy 1120 AAGGCTCAGTCACTCGCAGCCCTGTTTCTAG7Wy\AGTGCGTGACTTAGATGACTTTCTA 117 9 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1021 AAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTTTCTA 108 0 

Qy 118 0 TGGi^AAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCCA 1239 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1081 TGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGGA/iLAGCAGCGTGACCCCA 1140 

Qy 12 4 0 CTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACG 1299 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1141 CTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACG 1200 

Qy 1300 ACGCTGATCCGTCGTCAGATTTCCT^CGACTTCCGAGACCTGCCCACCCTCCTCATCCAT 1359 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1201 ACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCAT 1260 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1360 GGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGC 1419 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1261 GGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGC 1320 

1420 ATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCT 147 9 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1321 ATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCT 1380 

1480 TTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTACTAT 1539 

I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1381 TTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGC7UVTGCTTTACTAT 144 0 

1540 GAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAG 1599 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1441 GAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAG 1500 

1600 CTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAAC 1659 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 
1501 CTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAAC 1560 

1660 CTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTC 1719 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1561 CTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTC 162 0 

1720 TGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCC 177 9 

I I I M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1621 TGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCC 168 0 

1780 TTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTG 1839 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1681 TTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTG 174 0 

184 0 AGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTT 1899 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1741 AGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTT 1800 

1900 GAAGGGCTGATGAAGATTCAGTTCAGCAGAAG7\ACTTATAAAATGCCTCTCGGGAACCTC 1959 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 
1801 GAAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTC 1860 

1960 ACCATCGCGGTCTCAGGAGATAA7\ATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTC 2019 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1861 ACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGTCATGGAGCTGGACTCGTACCCTCTC 1920 

2020 TACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTG 207 9 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 
1921 TACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTG 198 0 

2080 TCCTTAAGGTTCATCAAACAGAAACCAAGTCAAGACTGGTGA 2121 

I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
1981 TCCTTAAGGTTCATCAAACAGAAACCAAGTCAAGACTGGTGA 2022 
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ORIGIN 



Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 

Tang.Y.T., Yue,H., Nguyen, D.B., Haf alia, A. J. , Elliott, V, S . , Lu,Y.^ 
Walia,N.K., Yao,M.G., Baughn,M.R., Gandhi, A. R., Ding,L., 
Sanjanwala,M. , Ramkumar,J., Arvizu,C,, Gietzen, K. J. , Lal,P.G., 
Azimzai,Y., Khan, F. A., Thangavelu, K. , Thornton, M., Lu,D.A. , 
Tribouley,C.M. , Warren, B. A., Ison,C.H., Das,D., Raumann, B. E. , 
Policky,J.L. and Kearney, L. 
Transporters and ion channels 
Patent: WO 0240541-A 29 23-MAY-2002; 
Incyte Genomics, Inc. (US) 

Location/Qualifiers 

1. .3239 

/organism="Homo sapiens" 
/mol_type="unas signed DNA" 
/db_xref-"taxon: 9606" 
/note-"Incyte ID No: 6585710CB1" 



Query Match 63.0%; 
Best Local Similarity 99.8%; 
Matches 1683; Conservative 



Score 1680.6; 
Pred. No. 0; 
0; Mismatches 



DB 6; Length 3239; 

4; Indels 0; Gaps 



0; 



Qy 



Db 



983 GGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACA 1042 
III I I I I I I I I II II II I I I I I I I I I I I I II I M II I I I I II I I I I I II I I I I I M 
12 GGGGCGGCCAGCACATGGTCCATTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACA 71 



Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1043 GCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGG 1102 
I II I II I II M I I M I I II I I I I I II I I I I I I I II I I I I I I I II I I I I I II II I I II M I 
72 GCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGG 131 

1103 AATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAT^AAGTGCGTG 1162 

I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I M II M I I 
132 AATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTG 191 

1163 ACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGG 1222 
I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I II I I II I 
192 ACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGG 251 

1223 AAAGCAGCGTGACCCCACTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGG 1282 
I I I I II I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
252 AAAGCAGCGTGACCCCACTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGG 311 

1283 CGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGC 1342 
I I I I M I I II I I I I I I I I I I I I I I I I I I I II I I I I M I I I I M I I I I I I II I I I I I I I I I 
312 CGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATTTCC7\ACGACTTCCGAGACCTGC 371 

1343 CCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCT 1402 
I I I I I I I I I I M I I I I I II I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I II I I I 
372 CCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCT 431 



Qy 14 03 ATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGA 14 62 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 432 ATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGA 491 

Qy 14 63 TCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGA 1522 

I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
Db 4 92 TCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGA 551 

Qy 1523 GGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTG 1582 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 552 GGGCAATGCTTTACTATGTyVCTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTG 611 

Qy 1583 CCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCA 1642 

I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 
Db 612 CCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCA 671 

Qy 1643 CCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGG 1702 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 672 CCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGG 731 

Qy 17 03 TGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCA 17 62 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I M I I I I I 
Db 732 TGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCA 791 

Qy 1763 CCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGG 1822 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I I I I I 
Db 7 92 CCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGG 851 

Qy 1823 GCTTCATGAT/lAACTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCT 18 82 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 

Db 852 GCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCT 911 

Qy 18 83 TCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAA 1942 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 912 TCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAA 971 

Qy 1943 TGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGC 2002 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 972 TGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGC 1031 

Qy 2003 TGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCA 2062 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 

Db 1032 TGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCA 1091 

Qy 2063 TGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGAAACCAAGTC7\AGACTGGTGAT 2122 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1092 TGGTCCTGTACTACGTGTCCTT7UVGGTTCATC7U\ACAGAAACCAAGTCAAGACTGGTGAT 1151 

Qy 2123 TCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACTGCACTCCCT 2182 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1152 TCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACTGCACTCCCT 1211 

Qy 2183 CCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCTCAGCTACAT 2242 

I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 

Db 1212 CCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACTiATGACCCTACAGATGCTCAGCTACAT 1271 

Qy 2243 CCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAATAAAGACAG 2302 



Db 1272 CCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAG7\ATAAAGACAG 1331 

Qy 2303 TCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAG7VAAACCTGCACTC 2362 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I I I I 

Db 1332 TCGA7UVGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAGAAAACCTGCACTC 1391 

Qy 2 363 GGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTATATAGGCAACTC 2422 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1392 GGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTATATAGGCAACTC 1451 

Qy 2423 GATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGCAGGAATTGTTG 2482 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1452 GATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGCAGGAATTGTTG 1511 

Qy 2 483 GAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCAGGGGCCCCA 2542 

I I I I I I I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1512 GAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCAGGGGCCCCA 1571 

Qy 2543 CACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTACCAGCAAGATGCC 2602 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 
Db 1572 CACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTACCAGCAAGATGCC 1631 

Qy 2 603 ATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCA7\AAGCCAACGTGAACAATTAAAAATGT 2662 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1632 ATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGCCAACGTGAACAATTAAAAATGT 1691 

Qy 2663 ATTGAGC 2669 

I I I I I I I 

Db 1692 ATTGAGC 1698 
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Query Match 56.9%; Score 1518.6; DB 10; Length 4829; 

Best Local Similarity 77.6%; Pred. No. 0; 

Matches 1973; Conservative 0; Mismatches 514; Indels 56; Gaps 9; 

Qy 99 CATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATAC 158 

I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 

Db 110 CATGGCTGAGAAGACCAAAGAGGAGACCCAGCTGTGGAACGGGACTGTACTCCAGGATGC 169 



Qy 159 CTCGGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTA 218 

II II II I I I I 11 II I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I 
Db 17 0 TTCAAGCCTCCAGGACAGCGTGTTCTCCTCTGAAAGTGACAACAGCCTCTACTTCACCTA 22 9 



Qy 219 CAGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTC 27 8 

I I I I I I III II II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 230 CAGTGGTCAGTCCAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACATGGCCTC 28 9 

Qy 279 TCAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTG 338 

I I I I I I I I I I I I I M I I I II I I II I I I I I I II I I I I I II I I I I I I I I I II I 
Db 290 TCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGTTACCGTGGAGGTCTCGCGGCAG 34 9 

Qy 339 CCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGAT 398 

I I I I I II II II I I I I I I I I I I II II I I I I II I I I I I II I II I I I I I I I I 
Db 350 CCAGGACTCCTGGGATCTGGGCATCCGAAATCTGAGCTTCAAAGTGAGGAGTGGACAGAT 4 09 

Qy 399 GCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGG 458 

MINI I I I I I I I I I I I I I I I I II I I M I I I I I I II II II II I I I I I II 
Db 410 GCTGGCTATCATAGGGAGCGCAGGCTGCGGGAGAGCCACATTACTCGACGTTATCACAGG 469 

Qy 459 CCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTC 518 

I III II II I I I I I I I I II I I I I I II I I I I I I I I II I I I II I I I I I I I 
Db 47 0 CAGAGACCATGGTGGCAAGATGAAATCAGGACA7UVTCTGGATAAACGGGCAACCCAGCAC 52 9 

Qy 519 GCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAA 578 

I I I I I I I I I I I I I M I I I I I I I I II I I M I I I I II I I I I I I M M I I I I I I I 
Db 530 GCCTCAGCTGATACAGAAGTGTGTGGCACATGTGCGCCAGCAAGACCAGCTGCTCCCCAA 58 9 

Qy 579 CTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTC 638 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 590 TCTGACTGTCAGAGAGACCCTGACTTTCATCGCCCAGATGCGCCTGCCCAAGACCTTCTC 64 9 

Qy 639 CCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTG 698 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 650 TCAGGCCCAGCGAGACAAACGGGTGGAAGACGTGATTGCGGAGCTGCGGCTGCGGCAGTG 709 

Qy 699 CGCTGACACCCGCGTGGGCTyVCATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAG 758 

III I I I I I I I I II I I I I I I I I I I I I I II III I I I I I I I I I I I II I I I II 
Db 710 CGCCAACACCCGCGTGGGCAACACATACGTACGCGGGGTGTCCGGGGGCGAGCGCCGAAG 7 69 

Qy 759 AGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCAC 818 

III I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I II II II I I I I I I I I 
Db 770 AGTGAGCATCGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATCCTGGATGAACCCAC 82 9 

Qy 819 CTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAA 87 8 

II II I I I I I I I I I I M I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 830 TTCCGGCCTCGACAGCTTCACCGCTCACAACCTGGTGAGAACTTTGTCCCGCCTGGCCAA 889 

Qy 879 AGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTT 938 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 890 AGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATT 94 9 

Qy 939 TGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACAT 998 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I 
Db 950 TGACCTGGTCCTTCTGATGACGTCTGGCACCCCTATCTACCTGGGGGTGGCACAGCACAT 1009 

Qy 999 GGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTT 1058 

III I I I I I II III I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1010 GGTGCAGTACTTTACATCAATTGGCTACCCTTGTCCTCGCTACAGCAACCCTGCTGACTT 1069 

Qy 1059 CTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGA 1118 



Db 1070 CTACGTGGACTTGACGAGCATTGACAGGCGCAGCAAAGAACAGGAGGTGGCCACCATGGA 1129 

Qy 1119 GAAGGCTCAGTCACTCGCAGCCCTGTTTCTAG7W1AAGTGCGTGACTTAGATGACTTTCT 1178 

I I I I I I I I I III II I I II II II I I I I II I I I I I I I I III II I I I I I I II 

Db 1130 GAAGGCTCGATTACTTGCAGCCTTGTTCCTAGAAAAAGTGCAAGGCTTTGACGACTTTCT 118 9 

Qy 1179 ATGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCC 1238 

I I I M I I I I I I I II I I I I I I I I I I I I I I I I III II 

Db 1190 GTGGAAAGCTGAGGCAAAGAGTCTCGACACAGGCACCTATGCAGTCAGCCAGACCCTCAC 124 9 

Qy 1239 ACTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTAC 1298 

II I I I I I II I M I I I I I I I I I I M I I I I I I I I I M I I I I I 

Db 1250 ACAGGACACCAACTG TGGAACTGCTGCTGAGCTGCCCGGGATGATACAGCAGTTTAC 1306 

Qy 1299 GACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCA 1358 

II I I M I I I I I I I II I I I I I II I M I M I I I I I III I I I I I I M I I I I II I I I 

Db 1307 CACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACTTGCCCACCCTGTTCATCCA 1366 

Qy 1359 TGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAG 1418 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I MINI 

Db 1367 TGGAGCAGAAGCCTGCCTGATGTCTCTCATCATTGGCTTCCTTTACTACGGCCACGCAGA 1426 

Qy 1419 CATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCC 1478 

II I I I M I I I M I I I I I I I I I I I I II I I I I I I I I I I I M I I I I I I I I I 

Db 1427 C7\AGCCGCTCTCCTTCATGGACATGGCAGCCCTCCTGTTCATGATAGGAGCACTCATTCC 1486 

Qy 1479 TTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCTiLATGCTTTACTA 1538 

III II I I II I I I I I I I I I I I I I I I I I I I I II I I I I III II I I I I I Mill 

Db 1487 TTTTAATGTCATTCTGGATGTCGTCTCCAAATGTCACTCGGAGCGGTCGCTGCTGTACTA 1546 

Qy 1539 TGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGA 1598 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II 
Db 1547 TGAACTGGAGGACGGACTGTACACTGCTGGTCCTTATTTCTTTGCCAAGGTCCTCGGTGA 1606 

Qy 1599 GCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAA 1658 

III II I I I I I I I I I I I I I I I I I II I I M I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1607 GCTGCCAGAGCACTGTGCCTATGTCATCATCTATGGGATGCCCATCTACTGGCTGACC7UV 1666 

Qy 1659 CCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTT 1718 

I I I I I I I I I M I III I I I I M I I I I I I I I I I I I I I I M I I I I I I I I I M 

Db 1667 CCTGCGGCCAGGGCCTGAGCTCTTCCTCCTGCACTTCATGCTTCTGTGGCTGGTGGTGTT 1726 

Qy 1719 CTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTC 1778 

III I I I I I I I M I I I I I I I I I I M I I III I I I I I I I I I I I I I I I I I I I MM 
Db 1727 CTGCTGCAGGACCATGGCCCTGGCCGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTC 17 86 

Qy 1779 CTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTT 1838 

Mill I II I I II II I M I I I II II II I M II I I I M I I I II I I I I I I I I I I 
Db 17 87 CTTCTGCTGCAACGCTCTCTACAACTCCTTCTACCTTACGGCTGGCTTCATGATAAACTT 184 6 

Qy 1839 GAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTT 1898 

II II I I II I I II III II M I I I I II II I I I I II I I I I I I II I I II I I II 

Db 1847 GAACAACCTGTGGATAGTACCTGCATGGATTTCCAAGATGTCGTTCCTCCGGTGGTGCTT 1906 

Qy 1899 TGAAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTAT7\7\AATGCCTCTCGGGAACCT 1958 

I II I II II I I I I I M I II II I M I II I I III I I I I M II I 



Db 



1907 



CTCAGGGCTGATGCAGATTCAGTTTAATGGACACATTTACACCACGCAGATCGGCAACCT 1965 



Qy 1959 CACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCT 2018 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I M I I I I I I I I 

Db 1967 CACCTTCTCCGTCCCCGGAGACGCGATGGTCACTGCCATGGACCTGAACTCACATCCTCT 2 026 

Qy 2019 CTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGT 2 078 

II II I II I II I I I I II I I I II I I II I I I I I I I II I I M II I I I I I I II I 
Db 2 027 TTATGCGATCTACCTCATCGTCATTGGCATCAGCTGTGGCTTCCTGTCCCTGTATTATCT 2086 

Qy 2079 GTCCTTAAGGTTCATCAAACAGAAACCAAGTCAAGACTGGTGATTCACGCCAGACGTCTG 2138 

I I I I I I I I I II M I I I I I I I I I I I I I I I II I I I I I I II I II I I 

Db 2087 GTCCTTGAAGTTCATCAAACAGAAGTCAATTCAAGATTGGTGAT GTTCAGCCTTGCT 2143 

Qy 2139 CCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACTGCACTCCCTCCTCAGGAGCCCCTTC 2198 

I I I I I I I I II II II II I I I II I I I I I I I I I I 
Db 2144 TCCACTGGTGGGACCCTTCTGCCTGGGCT GGCCGCCTCCTGAGGAGCCC 2192 

Qy 2199 CTGGGGACAGTGAGGACAATGACCCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTG 2258 

I M I I I I I I I I I I I I I I M I I III I I II I I I I I I I I I 

Db 2193 GACTGAGGACAATGATCCCACAGATCTCAAGCAGCATCGGCGTCTTGGTGCTG 2245 

Qy 2259 CAGTGGCACAGACCAGCCACAGGATGGCAGTAGAATAAAGACAGTCGAAAGGGATTTCTG 2318 

I I I I I I I I I I I I II II I II I II I I I I I I M I I I I I I I I I I I I I II III I I I I I I 
Db 2246 CAGTGGCACAGGTCAGCCACAGGATGGCAGTAGAATAAAGACAGTTGAGAGG— TTTCTG 2303 

Qy 2319 CTCACTGGCAGGAGACTGCGATGACTGGGAGAAAACCTGCACTCGGTGGCACCTACAACG 2378 

I I I I I I I I II III I I I I I I I M I I I I 

Db 2304 CTCACAGGCCTGGGCTTGTG A7UVCAGGTACTTCGTGAACCTGTAACG 2350 

Qy 2379 TTGCTT^TTTATTTCCTTTTGATATGCATTTATATAGGCAACTCGATATAGGATGGGAGC 2438 

I I I I I I I I I I I I III I I I I I II I I I I I I I I I I I I I II I 

Db 2351 TTGCTCATTCATTT TATATCTCTATATAAACAACCCAGTATGGAATGGGAAC 24 02 

Qy 2439 AAACTAGGAATGAATTGGGTAGCTAGACTGTGCAGGAATTGTTGGAACCTGGAGGGAACA 24 98 

MM I II I M I I I I I I I II I I I I M I I II II I I II I I I I I I I 
Db 2403 CAATTATTTATGAATTGAGTAGCTAGGCTATGCAGAGACTTGTGGAACCCCGAGAGGATA 24 62 

Qy 2499 ATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCAGGGGCCCCACACTCCGTGGTGAGCC 2558 

I I I II I I I II I II III I I I I II I I I II 

Db 2463 GTGGTCTGTAGCAAAACATTTAGCTTTCTCCACCA ATCTCACCCTGTTAAGCC 2515 

Qy 2559 ACCATCAATACAGAAAGTGACCTAAGATGTACCAGCAAGATGCCATCCCTTCTTTTTGTG 2618 

I I I I II II I I I I I I I I II I I I II I I I I I II I I I I I I I II 
Db 2516 GCTCCCGATACAGAGGGTGACCTAAAACTGACTAG— AAATGTCCTCTCTTATCTCTGTG 2573 

Qy 2619 TGGGGTCATGGGCTCCAAAAGCC 2641 

III II I II I I I I I I I 
Db 2574 TGGCTCCATGGACTTCCAGAGTC 2596 



RESULT 6 
AF324495 

LOCUS AF324495 3674 bp mRNA linear ROD 07-AUG-2001 

DEFINITION Mus musculus sterolin-2 (Abcg8) mRNA, complete cds . 

ACCESSION AF3244 95 

VERSION AF324495.1 GI:15088541 



KEYWORDS 

SOURCE Mus mus cuius {house mouse) 

ORGANISM Mus mus cuius 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 
REFERENCE 1 (bases 1 to 3674) 

AUTHORS Lu,K., Lee,M.H., Hazard, S., Brooks-Wilson, A. , Hidaka^H., Kojima,H., 
Ose,L., Stalenhoef , A. F. , Mietinnen, T . , Bjorkhem, I., Bruckert,E., 
Pandya^A., Brewer, H.B. Jr., Salen,G., Dean,M. , Srivastava, A. and 
Patel, S.B. 

TITLE Two genes that map to the STSL locus cause sitosterolemia : genomic 

structure and spectrum of mutations involving sterolin-1 and 
sterolin-2, encoded by ABCG5 and ABCG8, respectively 
JOURNAL Am. J. Hum. Genet. 69 (2), 278-290 (2001) 
MEDLINE 21344600 
PUBMED 11452359 
REFERENCE 2 (bases 1 to 3674) 

AUTHORS Lu,K., Lee,M.-H. and Patel, S.B. 
TITLE Direct Submission 

JOURNAL Submitted ( 2 9-NOV-2000 ) Division of Endocrinology, Diabetes and 

Medical Genetics, Medical University of South Carolina, 114 Doughty 
Street, STB541, Charleston, SC 29403, USA 
FEATURES Location/Qualifiers 
source 1. .3674 

/organism=="Mus mus cuius" 
/mol_type="mRNA" 
/strain="C57BL/6" 
/db_xref="taxon: 10090" 
/ tissue_type="liver" 
gene 1. .3674 

/gene="Abcg8" 
CDS 102. .2123 

/gene="Abcg8" 

/note="ABCG8" 

/codon_start=l 

/ product=" s terolin-2 " 

/protein_id="AAK84079 . 1" 

/db_xref="GI: 15088542" 

/translation="MAEKTKEETQLWNGTVLQDASQGLQDSLFSSESDNSLYFTYSGQ 
SNTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQML 
AIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLP 
NLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGE 
RRRVSIGVQLLWNPGILILDEPTSGLDSFTT^HNLVTTLSRLAKGNRLVLISLHQPRSD 
IFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKE 
REVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVEL 
PGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLM3LIIGFLYYGHGAKQLSFMDTAA 
LLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYV 
IIYAMPIYWLTNLRPVPELFLLHFLLVWLWFCCRTMAL7\ASAMLPTFHMSSFFCNAL 
YNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSI 
LGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKSIQDW" 

ORIGIN 



Query Match 56.6%; Score 1511.6; DB 10; Length 3674; 

Best Local Similarity 77.0%; Pred. No. 0; 

Matches 1965; Conservative 0; Mismatches 534; Indels 53; Gaps 8; 



Qy 



99 CATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATAC 158 



Db 101 CATGGCTGAGAAAACCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTTCAGGATGC 160 

Qy 159 CTC GGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCAC 215 

II II II I I I II I I I II II I I I I I I II I I I I I I I II I I I I I I I I II I I I I I I I 

Db 161 TTCGCAGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCAC 22 0 

Qy 216 CTACAGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTC7\ACTACCAGGTGGACCTGGC 275 

I I I I I I I II III II I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I II I II 
Db 221 CTACAGTGGTCAGTCCAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACATCGC 280 

Qy 276 CTCTCAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAG 335 

I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I M I I I I II I I I I I I I I I I I M 
Db 281 CTCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAG 34 0 

Qy 336 CTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTC7VAAGTGAGAAGTGGGCA 395 

I I I I I I II I I I I I I II I I I I I M I II I I I I I I I I I I II I I I I I I I II I II 
Db 341 CAGCCAAGACTCCTGTGAGCTGGGCATCCGAAATCTAAGCTTCAAAGTGAGGAGTGGACA 400 

Qy 396 GATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCAC 455 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I II II I I I I I I I I 
Db 4 01 GATGCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCAC 4 60 

Qy 456 TGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAG 515 

III I I I I I I I I I I II I I I I I II I I I I I II II I M II I I II II I I I I I I I 

Db 461 AGGCAGAGGCCACGGTGGCAAGATGA7\ATCAGGACAAATTTGGATAAATGGGCAACCCAG 520 

Qy 516 CTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCC 575 

I I I I I I I I II I I I I I I I I I I II II II II I I I I I I I I II I I I I II I I I II 
Db 521 TACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCC 580 

Qy 57 6 CAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTT 635 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 581 CAACCTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTT 640 

Qy 636 CTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCA 695 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 641 CTCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCA 700 

Qy 696 GTGCGCTGACACCCGCGTGGGCTUVCATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAG 755 

MINI I I I II I I I I I I I I I I I III II II Ml I II I I I I I I I I I II I I I 
Db 7 01 GTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCG 7 60 

Qy 756 GAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACC 815 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I Mill II II II I 
Db 7 61 ACGAGTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGG7VATCCTCATTCTGGATGAACC 820 

Qy 816 CACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGC 875 

I I I Mill II II I I I II II II II I I I I I II I II M I I I II II I II II I II I II 
Db 821 CACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGC 880 

Qy 876 CAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCT 935 

I II I I II II M I I II II I II I I I I I II I I II I II II II II II I II II II II I II II II 
Db 881 CAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCT 940 

Qy 936 GTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCA 995 

I I II I II II II I I II II II II I I I I I II I II I II I I II I I I II II I I II I I I 



Db 941 ATTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCA 1000 

Qy 996 CATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGA 1055 

I I I I I I I I I I I I I I I I I I I I III I I II I II I I II I II I I I I I I I II I I II 
Db 1001 AATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGA 1060 

Qy 1056 CTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAG 1115 

I II I I I I I I I I I I I I I II I II I I I I I I I I II I I I III I III I I I I I I I I 
Db 1061 CTTCTACGTGGACTTGACCAGCATCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGT 1120 

Qy 1116 GGAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTT 1175 

I I M I I I I I I I I I I II I I I I M I I I M I I I I I I I I I I I I I III I I I I I II I 
Db 1121 GGAGAAGGCACAGTCTCTTGCAGCCCTGTTCCTAGTWVAAGTACAAGGCTTTGATGACTT 1180 

Qy 1176 TCTATGGAAAGCAGAGACG7VAGGATCTTGACGAGGACACCTGTGTGGAAAGCAGCGTGAC 1235 

I I I I I I I II I I I I I I I I I I I I I II I I I I I III 

Db 1181 TCTGTGGAAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAGTCAGCCTGACCCT 1240 

Qy 1236 CCCACTAGACACCAACTGCCTCCCGAGTCCTACG7UVGATGCCTGGGGCGGTGCAGCAGTT 1295 

I II I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I 

Db 1241 CACACAGGACACTGACTG TGGGACTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTT 1297 

Qy 12 96 TACGACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCAT 1355 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 1298 TTCCACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCAT 1357 

Qy 1356 CCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGG 1415 

I II I I I I I I I I I I I I I I I I I I II I I III M I I I I I I II I I I I I I I I I 
Db 1358 TCATGGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGG 1417 

Qy 1416 GAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCAT 1475 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I II II I I I I I 
Db 1418 GGCCAAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCAT 1477 

Qy 1476 CCCTTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTA 1535 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MM M I I I I I I I I I I I II 

Db 147 8 TCCTTTCAATGTCATCCTGGATGTCGTCTCCAAATGTCACTCGGAGAGGTCAATGCTGTA 1537 

Qy 1536 CTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATCCTCGG 1595 

I I II I I I I I I M II I II I I II I II I I II I I I I I I M II II II I I I I M I I I M II 
Db 1538 CTATGAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGG 1597 

Qy 1596 GGAGCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGC 1655 

II I II II I I I I I I II I I I I II I I I I I I II II I I II I I M II I I I II I I II I I 

Db 1598 AGAATTGCCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGAC 1657 

Qy 1656 CAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGT 1715 

I II II I II I I I I I II II I I II I I I I I I I I M I M I II I I I I I II II I 
Db 1658 AT^CCTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGT 1717 

Qy 1716 CTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGC 1775 

I M I I I I I I II I I II I I II M I I I III I III II II M I II M I I I I I I I I I 
Db 1718 CTTCTGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTC 1777 

Qy 1776 CTCCTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAA 1835 

I I II I II II I I II I II I II II II I I II II I II II I I I I I I II II I II I I I I II I 
Db 1778 CTCCTTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAA 1837 



Qy 1836 CTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTG 18 95 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 1838 CTTGGACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTG 18 97 

Qy 1896 TTTTGAAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAA 1955 

II I II I I II I I I I I I I I I I I I II II I I I I I I II II 

Db 1898 CTTCTCGGGGCTGATGCAGATTCAATTTAATGGACACCTTTACACCACACAAATCGGCAA 1957 

Qy 1956 CCTCACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGCTGGACTCGTACCC 2015 

I I I I I I II I II II I I I I II I I I I II M I I M I I II I I M I I II 
Db ,.1958 CTTCACCTTCTCCATCCTCGGAGACACGATGATCAGTGCCATGGACCTGAACTCGCATCC 2017 

Qy 2016 TCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTA 2075 

Mill II II I I I I I I I II I I I I I III I I I II MINI M I I I I I I I I M 
Db 2018 ACTCTATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTA 2 077 

Qy 207 6 CGTGTCCTTAAGGTTCATCAAACAG7\AACCAAGTCAAGACTGGTGATTCACGCCAGACGT 2135 

I M I I I I I I II I I I I M I I I I I I I I I II I I II I I M I I I I I I 
Db 2078 TCTATCCTTG7VAGCTCATCAAACAGAAGTCAATTCAAGACTGGTGATACTCAGCCTTGCT 2137 

Qy 2136 CTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACTGCACTCCCTCCTCAGGAGCCCC 2195 

I I I I I I I II II I I I I I I I I 

Db 2138 CTCACTGGCGG GACCCTTTTCCCGGGGCTGGCCACCCCAGGAGGAGCC 2185 

Qy 2196 TTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCTCAGCTACATCCGGCCCAGGGTG 2255 

I I I I I I I I I I III III III II I I I I II III 

Db 2186 GGACTGGGGACAAGGCTCACACAGATCTCTCAG GCAGCAGCCACCTCTTAGTG 2238 

Qy 2256 CTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAATAAAGACAGTCGAAAGGGATTT 2315 

M I I I I I I I I I I I I I I I II I I I I I I M I I I I I I I I I I I I I I I I I I I II III III 
Db 2239 CTGCAGTGGCACAGGTCAGCCACAGGATGGCAGTAGAATAAAGACAGTTGAGAGGTGTTT 2298 

Qy 2316 CTGCTCACTGGCAGGAGACTGCGATGACTGGGAGAAAACCTGCACTCGGTGGCACCTACA 2375 

I I I I I I I I I I I I I I I I I I I I I I I I Ml 

Db 22 99 CTGCTCCCAGGCCCAGGCTTGTGATGGGAGAGAGAGAA ACCAGGT 2343 

Qy 2376 ACGTTGCTT^TTTATTTCCTTTTGATATGCATTTATATAGGCAACTCGATATAGGATGGG 2435 

II I I II II I I MM III I II I I II II II I I I I I II II I 

Db 2344 ACGTTGCTCATGCATTT TATATCTTTAAATAAACAACCCAGTATGGAATGGG 2395 

Qy 2436 AGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGCAGGAATTGTTGGAACCTGGAGGGA 24 95 

I I II II II I I II M II I II II I II II II I I I II II II I I I II I 

Db 2396 T^CCAATTATATATGAATTGAGTAGCTAGGCTATGCAGAAATTTCTGGAATCCTGAGAGG 2455 

Qy 2496 ACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCAGGGGCCCCACACTCCGTGGTGA 2555 

III I II I I II I I II I I I II II I II I I 

Db 2456 ATAGTGGTTTATAGCAAAGTGTTTAACTTTCTCTTCTACCATTCTCACAC TGTTAA 2511 

Qy 2556 GCCACCATCAATACAGAAAGTGACCTAAGATGTACCAGCAAGATG-CCATCCCTTCTTTT 2614 

II II I I I II II I I I II II II I I II I II II M I II II I I I 

Db 2512 GCCACTCCC7UVTACAAAGGGCGACCTAAAACAAACTAGCAAAATGTTTTTCGCTTATCTC 2571 

Qy 2615 TGTGTGGGGTCATGGGCTCCAAAAGCCAACGT 2646 

II I II I I II II I II II II II II II 
Db 2572 TGCGTGGATTCATGGACTCCAACCCCCAAAGT 2603 
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AY196216 2284 bp mRNA linear ROD Ol-JUN-2003 

Mus musculus strain PERA/Ei ATP-binding cassette sub-family G 
member 8 (AbcgS) mRNA, complete cds . 
AY196216 

AY196216. 1 GI: 31322261 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 2284) 

Wittenburg, H . , Lyons, M. A., Li,R., Churchill, G. A. , Carey, M.C. and 
Paigen, B. 

Primary Roles of FXR and 7VBCG5/ABCG8 in Cholesterol Gallstone 
Susceptibility: Evidence from a Cross of PERA/Ei and l/Ln Inbred 
Mice 

Unpublished 

2 (bases 1 to 2284) 

Lyons, M. A., Wittenburg, H . , Walsh, K.A., Carey, M.C. and Paigen, B. 
Direct Submission 

Submitted ( 12-DEC-2002 ) The Jackson Laboratory, 600 Main Street, 
Bar Harbor, ME 04609, USA 

Location/ Qualifiers 

1. .2284 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain^" PERA/Ei" 

/db_xref="taxon: 10090" 

/ chromosome="17" 

/map="55 cM" 

/sex="male" 

/ tissue_type="liver" 

1. .2284 

/gene="Abcg8" 

102. .2120 

/gene=="Abcg8" 

/note="ATP-dependent canalicular cholesterol transporter; 

white subfamily" 

/codon_start-l 

/product="ATP-binding cassette sub-family G member 8" 
/protein_id="AAO45096. 1" 
/db_xref-"GI : 31322262" 

/translation="MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQS 
NTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLA 
1 1 GS S GCGRAS LLDVI T GRGHGGKMKS GQI WI NGQP ST PQLVRKCVAHVRQHDQLL PN 
LTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGER 
RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDI 
FRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKER 
EVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVELP 
GMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAAL 
LFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVI 
I YAMP I YWLTNLRPVP ELFLLH FLLVWLWFCC RTMALAASAML PT FHMS S FFCN7VL Y 
NSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSIL 
GDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKSIQDW" 



ORIGIN 

Query Match 54.5%; Score 1454.2; DB 10; Length 2284; 

Best Local Similarity 79.8%; Pred. No. 1.3e-311; 

Matches 17 60; Conservative 0; Mismatches 423; Indels 22; Gaps 3 

Qy 99 CATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATAC 158 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 101 CATGGCTGAGAAAACCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTTCAGGATGC 160 

Qy 159 CTCGGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTA 218 

I I I I I I I I I I I I I I II I I I I I I I II II I II I I I M M I I I I I I I I I I I I I M I II 
Db 161 TTCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACCTA 220 

Qy 219 CAGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTC 278 

MINI III I I I I I I I I I I I I I II I II I II I I I I I I I I I I I I I I M I I I I I I I 
Db 221 CAGTGGTCAGTCCAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACATCGCCTC 280 

Qy 27 9 TCAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTG 338 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I II I I III I 
Db 281 TCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAG 34 0 

Qy 339 CCAGAATTCTTGTGAGCTGGGCATCCAG7UVCCTAAGCTTCAAAGTGAGAAGTGGGCAGAT 398 

III I II I I I I I II I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 341 CCAAGACTCCTGTGAGCTGGGCATCCGAAATCTAAGCTTCAAAGTGAGGAGTGGACAGAT 400 

Qy 399 GCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGG 458 

I I I I I I I I I I I M I I I I I I M I I I II I I M I I I I II I I II II I I I I I I I I II 
Db 401 GCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACAGG 460 

Qy 459 CCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTC 518 

I I I II I I I I I I I II I I I I II I I I I I II II I I I I I II I I I I I I II I I I I 
Db 461 CAGAGGCCACGGTGGCAAGATGAAATCAGGACAAATTTGGATAAATGGGCAACCCAGTAC 520 

Qy 519 GCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAA 57 8 

I I I I I I M I I I I I I I I I I I I I II II II I I I I I I I I I I MM I I I II II I II 
Db 521 GCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAA 58 0 

Qy 579 CTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTC 638 

I I I I I II II I M I I I MM II II I I I I II I II I I II I I I II I I I I II I II II 
Db 581 CCTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTC 640 

Qy 639 CCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTG 698 

I I I I II I I I I I II I M I I I I I I I II I Mill I II I I M I M I I I I I I M I I II I 

Db 641 CCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTG 700 

Qy 699 CGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAG 758 

III I I I I I I I M I I I II I I I I I II I I I II II M I M I I I M I I II I I 
Db 701 CGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACG 7 60 

Qy 759 AGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCAC 818 

III I I I I I I I II I I I I II II I I I I I I I I I II M I I II II II Mill II I II M II I 
Db 761 AGTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCAC 820 

Qy 819 CTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAA 878 

I I I I I I I I I II I I II II II I I I I I I II I II I I I II II I I I I I I I I I II I I II I 
Db 821 TTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAA 880 



Qy 879 AGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTT 938 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I II 
Db 881 GGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATT 940 

Qy 939 TGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACAT 998 

III I II I I I I I I I II I I I I I I I I I I I I I I I Mill! I M I I I I I I I I II I II 
Db 941 TGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAAT 1000 

Qy 999 GGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTT 1058 

III I I I I I I I I I I I I I I I III I I I I I I I M I I I I I I I I II I I I I I I Mill 
Db 1001 GGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTT 1060 

Qy 1059 CTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGA 1118 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I III I I I I I I I I III 
Db 1061 CTACGTGGACTTGACCAGCATCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGTGGA 112 0 

Qy 1119 GAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAT^AGTGCGTGACTTAGATGACTTTCT 117 8 

I I M I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I 

Db 1121 GAAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTTCT 1180 

Qy 1179 ATGGAAAGCAGAGACGTy^GGATCTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCC 1238 

I I I I I I I I I I I I I I I I I I I II I I I I I III II 

Db 1181 GTGGAAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAGTCAGCCTGACCCTCAC 124 0 

Qy 1239 ACTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTAC 1298 

II I II I I I I I I I I I I I I I I I I I I M II I I I I I I I I I 

Db 1241 ACAGGACACTGACTG TGGGACTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTTTTC 12 97 

Qy 12 99 GACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCA 1358 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II Mill II 

Db 1298 CACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCA 1357 

Qy 1359 TGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAG 1418 

I I I I I I I I I I I II I I II I I I I I I I I I II I I I II I I I I II I I I I I I I 

Db 1358 TGGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGC 1417 

Qy 1419 CATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCC 147 8 

II I I II I I M I I I I I I II I I I I I I I I I I I I I II I I I I II II II II II I II 

Db 1418 CAAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCC 1477 

Qy 1479 TTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTACTA 1538 

I II I II II I I I I I I I I I I I I I II I I II I II I I I I I I I II I I I I I I I I I II I I I 
Db 1478 TTTCAATGTCATCCTGGATGTCGTCTCCAAATGTCACTCGGAGAGGTCAATGCTGTACTA 1537 

Qy 1539 TGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGA 1598 

III II I II I I I II I I I I I I I I I I I I I I I II II I I I I II II I II II I I I I I II II 

Db 1538 TGAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGA 1597 

Qy 1599 GCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAA 1658 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I M I M I I I I I I II I II II I M 
Db 1598 ATTGCCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGACAAA 1657 

Qy 1659 CCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTT 1718 

I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I II M I I I I 
Db 1658 CCTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTT 1717 



Qy 1719 CTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTC 177 8 

III I I I II I I I I II I II I I II III I III I I I I II I I I I I I I I I I I II I I I I 
Db 1718 CTGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTC 1777 

Qy 1779 CTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTT 18 38 

M I I I I I I I I I II I I I I II M I I I I II II I I I I M I I I I I I I I I I I II II II I I 
Db 1778 CTTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTT 1837 

Qy 1839 GAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTT 1898 

I M I I I I I I I I I I I I II II I I I I I I II I I I II I Mill I I I II II I II 
Db 1838 GGACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTT 1897 

Qy 1899 TGAAGGGCTGATGAAGATTCAGTTCAGCAGAAG7\ACTTATAAAATGCCTCTCGGGAACCT 1958 

I I I I I I I I I I I I I I I I I I I II I I I I I I I M I M I I 

Db 1898 CTCGGGGCTGATGCAGATTCAATTTAATGGACACCTTTACACCACACAAATCGGCAACTT 1957 

Qy 1959 CACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCT 2 018 

I I M I I I II I I M I I M I I I M I I I I I I I I I I I I I I I I I I I I I 

Db 1958 CACCTTCTCCATCCTCGGAGACACGATGATCAGTGCCATGGACCTGAACTCGCATCCACT 2017 

Qy 2 019 CTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGT 2078 

II I II I I M I I M I I I I I M I I M I M I I I II I I I I I M I I I I I I I I I 

Db 2018 CTATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCT 2077 

Qy 2079 GTCCTTAAGGTTCATCAAACAGAAACCAAGTCAAGACTGGTGATTCACGCCAGACGTCTG 2138 

I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I III 
Db 2078 ATCCTTGAAGCTCATCAAACAGAAGTCAATTC7\AGACTGGTGATACTCAGCCTTGCTCTC 2137 

Qy 2139 CCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACTGCACTCCCTCCTCAGGAGCCCCTTC 2198 

I I I I I II I I I I I I I I I I 

Db 2138 ACTGGCGG GACCCTTTTCCCGGGGCTGGCCACCCCAGGAGGAGCCGGA 2185 

Qy 2199 CTGGGGACAGTGAGGACAATGACCCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTG 2258 

I I I I I M I I I III III III II I I I I II I I I I I I 

Db 2186 CTGGGGACAAGGCTCACACAGATCTCTCAG GCAGCAGCCACCTCTTAGTGCTG 2238 

Qy 2259 CAGTGGCACAGACCAGCCACAGGATGGCAGTAGAATAAAGACAGT 2303 

I I I I II I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I M M 
Db 2239 CAGTGGCACAGGTCAGCCACAGGATGGCAGTAGAATAAAGACAGT 2283 
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/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="I/LnJ" 

/db_xref="taxon: 10090" 

/ chromos ome= "17" 

/map="55 cM" 

/sex="male" 

/ tissue_type="liver" 

1. .2285 

/gene="Abcg8" 

102. .2120 

/gene="Abcg8" 

/note="ATP-dependent canalicular cholesterol transporter; 
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/ codon_start=l 

/product^"ATP-binding cassette sub-family G member 8" 

/protein_id="7UVO45095.1" 

/db_xref="GI : 31322260" 
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IIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCV7\HVRQHDQLLPN 
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EVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTA7VELP 
GMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHG7VKQLSFMDTAAL 
LFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVI 
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Query Match 54.3%; Score 1449.4; DB 10; Length 2285; 
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Matches 1755; Conservative 0; Mismatches 421; Indels 23; Gaps 



3; 



Qy 



Db 



99 CATGGCCGGG7UVGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATAC 158 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

101 CATGGCTGAGAAAACCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTTCAGGATGC 160 



Qy 



Db 



159 CTCGGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTA 218 

I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I II I II II II I I I I I II I I I I I I I 
161 TTCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACCTA 220 



Qy 



Db 



219 CAGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTC 27 8 

III II III I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I M I I I I II I I I 
221 CAGCGGTCAGTCCAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACATCGCCTC 280 



Qy 279 TCAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTG 338 

I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I 

Db 281 TCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAG 340 

Qy 339 CCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGAT 398 

III I II I I I I I II I I I II I I I I II II I II I I II I II I I I I I I I I I I I I I I I 
Db 341 CCAAGACTCCTGTGAGCTGGGCATCCGAAATCTAAGCTTCAAAGTGAGGAGTGGACAGAT 400 

Qy 399 GCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGG 458 

I I I I I I II I I I I I II I I I II I I II II I II I I I I II I I I II II I I I I I I I I II 
Db 4 01 GCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACAGG 4 60 

Qy 459 CCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATC7UVTGGGCAGCCCAGCTC 518 

I I I I I I I I I I I I I I I I I I II I I I I I II II I I I I I II I I I I I I I I I I I 

Db 4 61 CAGAGGCCACGGTGGCAAGATGAAATCAGGACAAATTTGGATAAACGGGCAACCCAGTAC 520 

Qy 519 GCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAA 578 

II I II I I I I I I I I II I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 521 GCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAA 580 

Qy 579 CTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAG75u\CCTTCTC 638 

I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 581 CCTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTC 640 

Qy 639 CCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTG 698 

I I I II I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 641 CCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTG 700 

Qy 699 CGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAG 758 

III I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 

Db 7 01 CGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACG 7 60 

Qy 759 AGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCAC 818 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 7 61 AGTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGG7UVTCCTCATTCTGGATGAACCCAC 820 

Qy 819 CTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAA 878 

I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 821 TTCTGGCCTCGACAGCTTCACAGCCCACAACCTGGTGACAACCTTGTCCCGCCTGGCCAA 880 

Qy 879 AGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTT 938 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 881 GGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATT 940 

Qy 939 TGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACAT 998 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I II 
Db 941 TGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAAT 1000 

Qy 999 GGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTT 1058 

III I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1001 GGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCAGACTT 1060 

Qy 1059 CTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGA 1118 

III I I I I I I I I M I I I I I I I I I I I I I I I I I I III I III I I I I I I I I III 
Db 1061 CTACGTGGACTTGACCAGCATCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGTGGA 1120 



Qy 1119 GAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTTTCT 1178 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I III I I II II I I II I 

Db 1121 GAAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTGCAAGGCTTTGATGACTTTCT 1180 

Qy 1179 ATGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCC 1238 

I I I I I I I I I I I I I II M I I II I I I I I III II 

Db 1181 GTGGAAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAGTCAGCCTGACCCTCAC 1240 

Qy 1239 ACTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTAC 1298 

II I I I II I I I I I I I I I I I I I I I I I I I II M I I I I I I I 

Db 1241 ACAGGACACTGACTG TGGGACTGCTGCTGAGCTGCCCGGGATGATAGAGCAGTTTTC 1297 

Qy 1299 GACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCA 1358 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I II 

Db 12 98 CACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCA 1357 

Qy 1359 TGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAG 1418 

MM II II I I I I I I II II II I I I I M I I I I I I I I I II I II II II I II 
Db 1358 TGGGTCGGAAGCCTGCCTGATGTCCCTCATCATCGGCTTCCTTTACTACGGCCATGGGGC 1417 

Qy 1419 CATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCC 1478 

II I I I II I I II I I II II I I II M I I I I I I I M II I I I I II II II I II II 
Db 1418 CAAGCAGCTCTCCTTCATGGACACGGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCC 1477 

Qy 147 9 TTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTACTA 1538 

M I I I I I I I I I II M II I I I I M I II I II I I II I I I I II II II I I I II I I I I I 
Db 147 8 TTTCAATGTCATCCTGGATGTCGTCTCCAAATGTCACTCGGAGAGGTCAATGCTGTACTA 1537 

Qy 1539 TGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGA 1598 

Ml I M II I II I I I I I M II I I I I II I I I I I I I II I I I I I M II II I M I II II 

Db 1538 TGAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGA 1597 

Qy 1599 GCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAA 1658 

I I II I I I II I I I I I II I I I I I II II I I I I I I I I II I I I I I I I I II I M I I II 
Db 1598 ATTGCCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGAC7WV 1657 

Qy 1659 CCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTT 1718 

I I I I I I I I I I III I I I I I I M I I II I I II II I II I I I I II M I I I M 

Db 1658 CCTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTACTGCTTGTGTGGTTGGTGGTCTT 1717 

Qy 1719 CTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTC 177 8 

Ml II I I I I I II I I II I I I I I III I III I I I I II I II I II I M II M I I I I 
Db 1718 CTGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTC 1777 

Qy 1779 CTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTT 1838 

II I I I II II II I I I I M I II I II I II II I I II I I I III I II II I M I I M I I I I I 

Db 1778 CTTCTTCTGCAATGCCCTCTAC7\ACTCCTTCTACCTTACCGCCGGCTTCATGATAAACTT 1837 

Qy 1839 GAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTT 1898 

I II I M I I I I I Mini II I I II I I I I I I II I I I II I I M II II I I II 
Db 1838 GGACAACCTGTGGATAGTGCCTGCATGGATATCCAAGCTGTCGTTCCTCCGGTGGTGCTT 1897 

Qy 18 99 TGAAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCT 1958 

II I I I I I II II I II I I I I I II I II I I I I II I I M I 

Db 18 98 CTCGGGGCTGATGCAGATTCAATTTAATGGACACCTTTACACCACACAAATCGGCAACTT 1957 

Qy 1959 CACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCT 2018 



Db 1958 CACCTTCTCCATCCTCGGAGACACGATGATCAGTGCCATGGACCTGAACTCGCATCCACT 2017 

Qy 2019 CTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGT 2078 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2018 CTATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCT 2 077 

Qy 2079 GTCCTTAAGGTTCATCAAACAGAAACCAAGTCAAGACTGGTGATTCACGCCAGACGTCTG 2138 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 
Db 2078 ATCCTTGAAGCTCATCAAACAGAAGTCAATTCAAGACTGGTGATACTCAGCCTTGCTCTC 2137 

Qy 2139 CCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACTGCACTCCCTCCTCAGGAGCCCCTTC 2198 

III I I I I I II I I I I I I I I 

Db 2138 ACTGGCG GGACCCTTTCCCGGGGCTGGCCACCCCAGGAGGAGCCGGA 2184 

Qy 2199 CTGGGGACAGTGAGGAC7\ATGACCCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTG 2258 

I I I I I I I I I I III III III II I I I I II MINI 

Db 2185 CTGGGGACAAGGCTCACACAGATCTCTCAG GCAGCAGCCACCTCTTAGTGCTG 2237 

Qy 2259 CAGTGGCACAGACCAGCCACAGGATGGCAGTAGAATAAA 2297 

I I II M I II I I I I I I I I II I I I I I I I I I M I I I I I I 

Db 2238 CAGTGGCACAGGTCAGCCACAGGATGGCAGTAAAATAAA 2276 
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AX685731 
Sequence 3 
AX685731 
AX685731.1 



CDS 



2019 bp DNA 
from Patent WO02081691. 

GI:29371740 



linear PAT 29-MAR-2003 



Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 

Hobbs,H.H., Shan,B., Barnes, R. and Tian,H. 
AbcgS and abcgS : compositions and methods of use 
Patent: WO 02081691-A 3 17-OCT-2002; 

Tularik Inc. (US) ; BOARD OF REGENTS UNIVERSITY OF TEXAS SYSTEM 
(US) 

Location/Qualifiers 
1. .2019 

/organism="Mus musculus" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 10090" 
1. .2019 

/note="unnamed protein product; mouse ABCG8 (mABCG8) " 
/codon_start-l 
/protein__id="CAD8 6571. 1" 
/db_xref="GI: 29371741" 
/db_xref="REMTREMBL:CAD86571" 

/translation="MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQS 
NTLEVRDLTYQVDIASQVPWFEQLAQFKI PWRSHSSQDSCELGIRNLSFKVRSGQMLA 
IIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPN 
LTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGER 
RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDI 



FRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKER 
EVATVEKAQSLA7VLFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVELP 
GMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAAL 
LFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVI 
lYAMPIYWLTNLRPVPELFLLHFLLWLWFCCRTMALAASAMLPTFHMSSFFCNALY 
NSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSIL 
GDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKSIQDW" 



ORIGIN 



Query Match 53.6%; Score 1430; DB 6; Length 2019; 

Best Local Similarity 82.0%; Pred. No. 3.1e-306; 

Matches 1659; Conservative 0; Mismatches 360; Indels 3; Gaps 1; 

Qy 100 ATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCGCAGGATACC 159 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 ATGGCTGAG7W\ACCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTTCAGGATGCT 60 

Qy 160 TCGGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTAC 219 

I I I I I I I I I I I I I I II I II II II II II I I I I I I I I I II I II I I I I I I II II I I I I I 
Db 61 TCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACCTAC 12 0 

Qy 22 0 AGTGGCCAGCCC7\ACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTCT 27 9 

I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I 
Db 121 AGTGGTCAGTCCAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACATCGCCTCT 18 0 

Qy 280 CAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTGC 339 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II 

Db 181 CAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAGC 24 0 

Qy 340 CAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATG 399 

II I II I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I 

Db 241 CAAGACTCCTGTGAGCTGGGCATCCGAAATCTAAGCTTCA7UVGTGAGGAGTGGACAGATG 300 

Qy 4 00 CTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGC 459 

I I I I I I I I I I I I I I I I I I I I I I I II I I I M I I I M I I II II I I I I I I I I III 
Db 301 CTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACAGGC 360 

Qy 460 CGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCG 519 

I I I I I I I I I I I I I I I I I II I I I I I II II I I I I I I I I I I I I I I I I I I II 
Db 361 AGAGGCCACGGTGGCAAGATGAAATCAGGACAAATTTGGATAAATGGGCAACCCAGTACG 42 0 

Qy 520 CCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAAC 57 9 

I I I I I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 421 CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 48 0 

Qy 580 TTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCC 639 

I I I I II II I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I II I 

Db 481 CTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCC 540 

Qy 640 CAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGC 699 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 541 CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 600 

Qy 7 00 GCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGA 759 

II I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 601 GCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGA 660 



Qy 760 GTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACC 819 

II I II I I I I I I I I I I I I I I I II II M I I I I I II I I I M I I I II II II II II II II 
Db 661 GTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACT 720 

Qy 82 0 TCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAA 87 9 

I I I II I I I I I I M I II II I M M II II I II II I I I I I M II I I I I II I I I I I I 
Db 721 TCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 780 

Qy 88 0 GGCTU^iCCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTT 939 

I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I II I II III 

Db 781 GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 84 0 

Qy 940 GATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATG 999 

II I I I I I I I I I I I I II II I I I I I I I M I I II I II I I I I II I M I II I I I III 

Db 841 GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 900 

Qy 1000 GTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTC 1059 

II I I I II I I I I II I II I III I I I I II I I I I I I M I I I I II II I I I I I I I I I 
Db 901 GTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTC 960 

Qy 1060 TATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAG 1119 

II I I I II I I I II I I I I I I I I I I I I I I I I I I III I III I I I I I I M MM 
Db 961 TACGTGGACTTGACCAGCATCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGTGGAG 1020 

Qy 1120 AAGGCTCAGTCACTCGCAGCCCTGTTTCTAGA7\AAAGTGCGTGACTTAGATGACTTTCTA 1179 

I I I II Mill II II II II II I II M II II II I II I I III II II I I II I II 

Db 1021 AAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTTCTG 1080 

Qy 1180 TGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCCA 1239 

II M I II I II I I II M I II II MM I III III 

Db 1081 TGGTVAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAGTCAGCCTGACCCTCACA 1140 

Qy 124 0 CTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACG 12 99 

I I I I I I I I II II I II II M II M I II I II II M I I 

Db 1141 CAGGACACTGACTG TGGGACTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTTTTCC 1197 

Qy 1300 ACGCTGATCCGTCGTCAGATTTCCT^ACGACTTCCGAGACCTGCCCACCCTCCTCATCCAT 1359 

II II II I II I II II II II M II II I I II II II I II II II II M I M Mill III 

Db 1198 ACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCAT 1257 

Qy 1360 GGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGC 1419 

II I I II I II II I II II II II I I II I II II II II II I II II M II I I 
Db 1258 GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 1317 

Qy 1420 ATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCT 1479 

I I II I I I II I II II II II II II r I II II I I II II II II II II II II I III 

Db 1318 AAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 1377 

Qy 1480 TTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTACTAT 1539 

Mill II I II I II II II II II M M II II I Mil II II II II II II I MUM 
Db 1378 TTCAATGTCATCCTGGATGTCGTCTCCAAATGTCACTCGGAGAGGTCAATGCTGTACTAT 1437 

Qy 1540 GAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAG 1599 

II II II I I II II II M II II II II I II II II II M II I II II I II M II II II 

Db 1438 GAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAA 14 97 

Qy 1600 CTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCTVAC 1659 



Db 1498 TTGCCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGACAAAC 1557 

Qy 1660 CTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTC 1719 

III I II I I I III I I II I I II II I I II II I I I I I I I I I I II I II I I I II 
Db 1558 CTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTC 1617 

Qy 1720 TGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCC 1779 

II II I I I II I I I I I I I I I I I III I III II II I I I II II I I I I I I I I I I II I 
Db 1618 TGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCC 1677 

Qy 1780 TTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTG 1839 

I I I I I I I I I I I I II II I I I II I I I I I I I I I I II I I I I I II I I I I I I I II I I I II 
Db 1678 TTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTG 1737 

Qy 1840 AGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTT 1899 

II I I I I I I I I I I I I I I II I I I I I I I I I I MM II II I I II I II II II 
Db 1738 GACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTC 1797 

Qy 1900 GAAGGGCTGATGAAGATTCAGTTCAGCAG7\AGAACTTATAAAATGCCTCTCGGGAACCTC 1959 

II II I I II I I II II I I II I II I II I I I II I I I II I I 

Db 1798 TCGGGGCTGATGCAGATTCAATTTAATGGACACCTTTACACCACACAAATCGGCAACTTC 1857 

Qy 1960 ACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTC 2019 

I II I I I II II II I I II I I II II I I I II II II I II I II I II I II 

Db 1858 ACCTTCTCCATCCTCGGAGACACGATGATCAGTGCCATGGACCTGAACTCGCATCCACTC 1917 

Qy 2 02 0 TACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTG 2079 

II II I II II I I II II I II II III II II I II II I I II I II M II I II I 

Db 1918 TATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTA 1977 

Qy 2080 TCCTTAAGGTTCATCAAACAGAAACCAAGTCAAGACTGGTGA 2121 

I I I II I I II I I I II II II II III I I II I II I I I II I 
Db 1978 TCCTTGAAGCTCATCAAACAGAAGTCAATTCAAGACTGGTGA 2019 
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AC084265 127066 bp DNA linear PRI ll-DEC-2001 

Homo sapiens chromosome 2, clone CTB-2367F13, complete sequence. 
AC084265 

AC084265.4 GI: 174 8 8 659 
HTG. 

Homo sapiens (h uma n ) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Horainidae; Homo. 

1 (bases 1 to 127066) 

Birren,B., Linton, L., Nusbaum, C. and Lander, E. 
Homo sapiens chromosome 2, clone CTB-2367F13 
Unpublished 

2 (bases 1 to 127066) 

Birren,B., Linton, L., Nusbaum,C., Lander, E., Abraham,H., Allen, N., 
Anderson, S., Barna,N., Bastien,V. , Beda,F., Boguslavkiy, L . , 
Boukhgalter, B. , Brown, A., Burkett,G., Campopiano, A. , Castle, A. , 
Choepel,Y., Colangelo, M. , Collins, S., Collymore, A. , Cooke, P., 
DeArellano, K, , Dewar,K., Diaz,J.S., Dodge, S., Ferreira,P., 



FitzHugh^W., Gage,D., Galagan,J., Gardyna^S., Gincle,S., Goyette,M., 
Graham, L., Grand-Pierre, N. , Hagos,B., Heaford,A., Horton,L., 
Iliev, I., Johnson, R., Jones, C, Kann,L., Karatas,A., LaRocque,K., 
Lamazares,R. , Landers, T., Lehoczky,J., Levine,R., Lieu,C., Liu,G., 
Macdonald, P. , Marquis, N., McCarthy, M., McEwan,P., McKernan,K., 
McPheeters, R. ^ Meldrim,J., Meneus,L., Mihova,T., Mlenga,V., 
Morrow, J., Murphy, T., Naylor,J., Norman, C . H. , O'Connor, T., 
0'Donnell,P., 0'Neil,D., 01ivar,T.M., Oliver, J., Peterson, K., 
Pierre, N., Pisani,C., Pollara,V., Raymond, C, Rieback,M., Riley, R., 
Rogov,P., Rothman,D., Roy,A. , Santos, R., Schauer,S., Severy,P., 
Sougnez, C. , Spencer, B. , Stange-Thomann, N . , Stojanovic,N. , 
Strauss, N., Subramanian,A. , Talamas,J,, Tesfaye,S., Theodore, J., 
Tirrell,A., Travers,M., Trigilio,J., Vassiliev, H . , Viel,R., Vo,A. , 
Wilson, B., Wu,X., Wyman,D., Ye,W.J., Young, G., Zainoun,J., 
Zimmer , A. and Zody,M. 
TITLE Direct Submission 

JOURNAL Submitted ( 18-OCT-2000 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
REFERENCE 3 (bases 1 to 127066) 

AUTHORS Birren,B., Linton, L., Nusbaum,C., Lander, E., Ali,A. , Allen, N,, 

Anderson, S., Barna,N., Bastien,V., Boguslavkiy, L. , Boukhgalter, B . , 
Brown, A. , Camarata,J., Campopiano, A. , Chang, J., Chazaro,B,, 
Choepel,Y,, Colangelo,M. , Collins, S., Collymore,A. , Cook, A. , 
Cooke, P., DeArellano,K. , Dewar,K,, Diaz, J. S., Dodge, S., Faro,S., 
Ferreira,P., FitzHugh,W., Gage,D., Galagan,J., Gardyna,S., 
Ginde,S,, Gord,S., Goyette,M. , Graham, L., Grand-Pierre, N . , 
Hagos,B., Heaford,A. , Horton,L., Hulme,W. , Iliev, I . , Johnson, R., 
Jones, C, Kamat,A,, Karatas,A., Kells,C., LaRocque,K., 
Lamazares,R. , Landers, T., Lehoczky,J., Levine,R., Liu,G., 
MacLean,C., Macdonald, P . , Major, J., Marquis, N., Matthews, C, 
McCarthy, M., McEwan,P., McKernan,K., McPheeters , R. , Meldrim, J,, 
Meneus,L., Mihova,T., Mlenga,V., Murphy, T., Naylor,J., Nguyen, C, 
Norbu,C., Norman, C.H., O'Connor, T., O' Donnell, P. , 0*Neil,D,, 
Oliver, J., Peterson, K., Phunkhang, P . , Pierre, N., Pollara,V. , 
Raymond, C, Retta,R., Rieback,M,, Riley, R., Rise,C., Rogov,P., 
Roman, J., Rosetti,M., Roy, A., Santos, R., Schauer,S., Schupback, R. , 
Seaman, S., Severy,P., Spencer, B., Stange-Thomann, N. , Stojanovic, N . , 
Strauss, N., Subramanian,A. , Talamas,J., Tesfaye,S., Theodore, J., 
Topham,K., Travers,M., Travis, N., Trigilio,J., Vassiliev, H . , 
Viel,R., Vo,A., Wilson, B., Wu,X., Wyman,D., Ye,W.J., Young, G., 
Zainoun,J., Zembek,L., Zimmer, A. and Zody,M. 

TITLE Direct Submission 

JOURNAL Submitted (24-AUG-2001) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
REFERENCE 4 (bases 1 to 127066) 

AUTHORS Birren,B., Linton, L., Nusbaum, C, Lander, E., Ali,A. , Allen, N., 

Anderson, S., Barna,N., Bastien,V., Boguslavkiy, L. , Boukhgalter, B . , 
Brown, A., Camarata,J., Campopiano, A. , Chang, J., Chazaro,B., 
Choepel,Y,, Colangelo,M. , Collins, S., Collymore,A, , Cook, A. , 
Cooke, P., DeArellano, K, , Dewar,K., Diaz, J. S., Dodge, S., Faro,S., 
Ferreira,P., FitzHugh,W., Gage,D., Galagan,J., Gardyna,S., 
Ginde,S., Gord,S., Goyette,M. , Graham,L., Grand-Pierre, N . , 
Hagos,B., Heaford,A. , Horton,L., Hulme,W., Iliev, I., Johnson, R., 
Jones, C, Kamat,A,, Karatas,A., Kells,C., LaRocque,K., 
Lamazares, R. , Landers, T., Lehoczky,J., Levine,R., Liu,G., 
MacLean,C., Macdonald, P . , Major, J., Marquis, N., Matthews, C, 
McCarthy, M., McEwan,P., McKernan,K., McPheeters , R. , Meldrim, J., 



Meneus,L., Mihova,T., Mlenga,V,, Murphy, T., Naylor,J., Nguyen, C, 
Norbu,C., Norman, C.H., O'Connor,?., O ' Donnell, P. , 0'Neil,D., 
Oliver, J., Peterson, K., Phunkhang, P . , Pierre, N., Pollara,V., 
Raymond, C, Retta,R., Rieback,M., Riley, R,, Rise,C., Rogov, P., 
Roman, J., Rosetti,M., Roy, A., Santos, R., Schauer,S., Schupback,R. 
Seaman, S., Severy,P., Spencer, B., Stange-Thomann, N . , Stojanovic,N 
Strauss, N., Subramanian,A. , Talamas,J., Tesfaye,S., Theodore, J. , 
Topham, K., Travers,M., Travis, N., Trigilio,J., Vassiliev, H . , 
Viel,R., Vo,A., Wilson, B,, Wu,X., Wyman,D., Ye,W.J., Young, G., 
Zainoun,J., Zembek,L., Zimmer,A. and Zody,M. 
TITLE Direct Submission 

JOURNAL Submitted {ll-DEC-2001) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
COMMENT On Dec 11, 2001 this sequence version replaced gi: 15284200. 

All repeats were identified using RepeatMasker : 
Smit, A.F.A. & Green, P. (1996-1997) 

http : / / ftp . genome . Washington . edu/RM/RepeatMas ker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact: sequence_submissions@genome.wi.mit.edu 

Project Information 

Center project name: L11578 
Center clone name: 2367 F 13 



FEATURES 

source 



repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

repeat_ 

unsure 

unsure 

repeat_ 

repeat 



Location/Qualifiers 
1. .127066 

/organism="Homo sapiens" 
/mo l_type=" genomic DNA" 
/db_xref="taxon: 9606" 
/ chr omo s ome= " 2 " 
/map="2" 

/clone="CTB-2367F13" 

/clone_lib-"CITB Human BAC" 
region complement ( 8 . ,170) 

/rpt_family-"MER47A" 
region 171. .468 

/ r p t_f ami 1 y = " Al u S x " 
region complement (4 69 . .516) 

/ rp t_f ami 1 y= "MER4 7 A" 
region 988. .1049 

/rpt_family="MIR" 
region complement ( 1294 . .14 48) 

/rpt_family="LlME4A" 
region complement (2662 . .2954) 

/rpt__family=="AluSx" 
region 4049. ,4431 

/rpt_family="L2 " 

5261. ,5269 

/note-"<30 qual SNGL region" 
7192. ,7202 

/note="<30 qual SNGL region" 
region 7310. .7472 

/rpt_family-"MIR" 
region 7488. .7582 

/rpt_family="MIR" 



repeat_region 7589. .7628 

/rpt_f amily=" (TTG) n" 
repeat_region complement ( 7631 . .7781) 

/rpt_family="AluSg/x" 
repeat_region 7791. .7922 

/ r p t_f amil y = "MI R " 
repeat_region complement (7977 . . 8300) 

/ rp t_f amily= "Alu Jb " 
repeat_region 9044. .9343 

/rpt_family="AluSq" 
repeat_region 10315. .10344 

/ r p t_f ami 1 y= " AT__r i ch " 
repeat_region 10355. .10681 

/rpt_family="AluJo" 
repeat_region 10683. .10993 

/rpt_family="AluSx" 
repeat_region complement ( 12221 . . 12282) 

/ rp t_f ami 1 y= "MI R3 " 
repeat_region complement ( 12306 . .12449) 

/ rp t_f ami 1 y= "MI R" 
repeat_region complement ( 13008 , .13189) 

/rpt_family="MER33" 
repeat_region complement (13190. . 13471) 

/ r p t__f ami 1 y = " Al u Jo " 
repeat_region complement ( 13472 . .13612) 

/ r p t_f ami 1 y = " ME R3 3 " 
repeat_region 13899. .13922 

/ rp t_f ami 1 y= " GC_r i ch " 
repeat_region complement ( 14184 . .14250) 

/rpt_family="L2" 
repeat_region 14552. .14630 

/ r p t_ f ami 1 y = " MER5 A" 
repeat_region complement (14809. . 15100) 

/rpt_family="7y.uSx" 
repeat_region complement (15363. . 15679) 

/ rp t__f ami 1 y= " Al u Y " 
repeat_region complement ( 15681 . .15979) 

/rpt_family="AluSx" 
repeat_region 16292. .16388 

/rpt_family="L2" 
repeat_region 16392. .16508 

/ r p t_ f ami 1 y = " MLT IF" 
repeat_region complement (16538 . . 16616) 

/ r p t_f ami 1 y= " LTR3 7 B " 
repeat_region 16618. .16687 

/ r p t_f ami 1 y = " Al u " 
repeat_region complement ( 16988 . .17104) 

/rpt_family="L2" 
repeat_region 17540. .17895 

/ rp t_f ami 1 y= "MLT 1 Al " 
repeat_region complement (17911 . . 18209) 

/rpt_family="AluSq" 
repeat_region 18487. .18680 

/rpt_family="LTR16Al " 
repeat_region 18802. .19026 

/rpt_family="AluJo" 
repeat_region complement ( 19092 . .19390) 



/rpt_fainily="AluJo" 
repeat__region complement (21369 . .21675) 

/rpt_family="AluSx" 
repeat_region complement (22474 . .22763) 

/rpt_family="MER115" 
repeat_region complement (22 84 3 . .22942) 

/ rp t_f ami 1 y- "MERl 15" 
repeat_region 23239. .23311 

/rpt_family-"L3" 
repeat_region complement (23968 . .24265) 

Query Match 27.1%; Score 724; DB 9; Length 127066; 

Best Local Similarity 90.0%; Pred. No. 9e-150; 

Matches 824; Conservative 0; Mismatches 5; Indels 87; Gaps 1; 

Qy 1841 GCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTG 1900 

II I I II I I I I I I I I I I I II II I I I I I I I I I I I I I II I II I I I M I I M M I II I I 
Db 59498 GCTGTCTGTCTCCAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTG 59557 

Qy 1901 AAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTCA 1960 

M I I I I I I I I I I I I I I I I I I I I I I I I I I M M I II I I I I I I I II I I I I I I I I I I I I I I I I 
Db 59558 AAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTCA 59617 

Qy 1961 CCATCGCGGTCTCAGGAGATAAA 1983 

I I II I I I I M I M I I I I I I I II I 
Db 59618 CCATCGCGGTCTCAGGAGATAAAGTAAGCGGGGAAGGCCTCGGGTTCTAAATTATTGGAC 59677 

Qy 1984 ATCCTCAGTG 1993 

I I I I II I I I I 

Db 59678 GTCCGGCTTTCCATCCTCCTCATGAGCCCACTGCATGTCTGTGTCTCCAGATCCTCAGTG 59737 

Qy 1994 CCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCG 2053 

I I I I I I I I I I I I I I I I I I I I II II I I I I M I II I I I I I I I I I II I I I II I I I II I I I I I I 
Db 59738 CCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCG 597 97 

Qy 2054 GTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGAAACCAAGTC7\AG 2113 

I M I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I M I I II I I I I I I I I I 
Db 59798 GTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGA7\ACCAAGTCAAG 59857 

Qy 2114 ACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACT 2173 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I M 
Db 59858 ACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACT 59917 

Qy 2174 GCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCT 2233 

I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I II I I I I I I I I I I I I I I I I I I 
Db 59918 GCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCT 59977 

Qy 2234 CAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAA 2293 

I I I I II I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M II I I I I I I I I I I I I I I I I I I I I 
Db 59978 CAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAA 60037 

Qy 2294 TAAAGACAGTCG7\AAGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAGAAAA 2353 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I II I I 
Db 60038 TAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAGAAAA 60097 



Qy 



2354 CCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTATAT 2413 
I I I I I I M I II I I I I I I I I I II I I M I II M I I I I I I I II I M I II I I I I I I I I II II I I 



Db 



60098 CCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTATAT 60157 



Qy 2414 AGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGCAG 2473 

I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 60158 AGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGCAG 60217 

Qy 24 74 GAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCA 2533 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I M I I I I I I I I I 
Db 60218 GAATTGTTGGAACCTGGAGGGTVACAATTVACAGTAGCTAGCAGATTTGGCTTCATCTTCCA 60277 

Qy 2534 GGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTACCAG 2593 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I M I M 
Db 60278 GGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTACCAG 60337 

Qy 2594 CAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGCCAACGTGAACAAT 2653 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 60338 CAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGCCAACGTGAACAAT 60397 

Qy 2654 TAAAAATGTATTGAGC 2669 

I I I I I I I I I I I I I I I I 
Db 60398 TAAAAATGTATTGAGC 60413 
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DEFINITION 
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VERSION 
KEYWORDS 
SOURCE 
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AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



HTG 27-MAR-2003 
unordered 



AC087053 182261 bp DNA linear 

Homo sapiens chromosome 2 clone RP11-959M3 map 2, 
pieces . 
AC087053 

AC087053. 13 GI : 25140148 

HTG; HTGS_PHASE1; HTGS_FULLTOP ; HTGS_CANCELLED . 
Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo . 

1 (bases 1 to 182261) 
Birren,B,, Nusbaum, C. and Lander, E. 

Homo sapiens chromosome 2, clone RP11-959M3 
Unpublished 

2 (bases 1 to 182261) 

Birren^B., Linton, L., Nusbaum, C, Lander, E., Allen, N., Anderson, S., 
Barna,N., Bastien,V., Boguslavkiy, L . , Boukhgalter , B . , Brown, A. , 
Camarata,J., Campopiano,A. , Choepel,Y., Colangelo, M. , Collins, S., 
Collymore, A. , Cooke, P., DeArellano, K. , Dewar,K., Diaz, J. S., 
Dodge, S., Ferreira,P., FitzHugh,W., Gage,D., Galagan,J., 
Gardyna,S., Ginde,S., Goyette,M. , Graham,L., Grand-Pierre, N . , 
Hagos,B., Heaford,A. , Horton,L., Hulme,W., Iliev,I., Johnson, R., 
Jones, C, Karatas,A. , LaRocque,K., Lamazares , R. , Landers, T., 
Lehoczky, J. , ,Levine, R. , Liu,G., MacLean,C., Macdonald, P . , 
Marquis, N., Matthews, C, McCarthy, M., McEwan,P., McKernan,K., 
McPheeters, R. , Meldrim,J., Meneus,L., Mihova,T., Mlenga,V., 
Murphy, T., Naylor,J., Nguyen, C, Norbu,C., Norman, C.H,, 
O'Connor, T., 0'Donnell,P., 0'Neil,D,, Oliver, J., Peterson, K., 
Phunkhang, P. , Pierre, N., Pollara,V., Raymond, C, Retta,R., 
Rieback,M., Riley, R., Rogov, P., Roman, J., Rosetti,M., Roy, A., 
Santos, R,, Schauer,S., Seaman, S., Severy,P,, Sougnez,C., 
Spencer, B. , Stange-Thomann, N . , Sto j anovic, N . , Straus s,N. , 



Subramanian, A- , Talamas^J., Tesfaye,S., Theodore, J., Travers,M., 
Travis, N., Trigilio,J., Vassiliev, H . , Viel,R., Vo,A. , Wilson, B., 
Wu,X., Wyman,D., Ye,W.J., Young, G., Zainoun, J., Zeinbek,L., 
Zimmer,A. and Zody,M. 
TITLE Direct Submission 

JOURNAL Submitted ( 30-NOV-2000 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
REFERENCE 3 (bases 1 to 182261) 

AUTHORS Birren,B., Nusbaum,C., Lander, E., Ali,A., Allen, N., Anderson, S., 
Barna,N., Bastien,V., Bloom,T., Boguslavkiy, L, , Boukhgalter, B. , 
Camarata,J., Chang, J., Chazaro,B., Choepel,Y., Collymore, A. , 
Cook, A., Cooke, P., DeArellano, K. , Dewar,K., Diaz, J. S,, Dodge, S., 
•Faro,S., Ferreira,P., FitzGerald,M. , Gage,D., Galagan,J-, 
Gardyna,S., Gord,S,, Graham,L., Grand- Pierre, N . , Hafez, N., 
Hagos,B., Horton,L., Hulme,W. , Iliev, I., Johnson, R., Jones, C, 
Kamat,A., Karatas,A., Kells,C., Landers, T., Levine,R., 
Lindblad-Toh,K. , Liu,G., MacLean,C., Macdonald, P . , Major, J., 
Matthews, C, McCarthy, M., Meldrim, J., Meneus,L., Mihova,T., 
Mlenga,V., Murphy, T., Naylor,J., Nguyen, C, Nicol,R., Norbu,C., 
Norman, C.H., O'Connor, T., O ' Donnell, P . , 0*Neil,D., Oliver, J., 
Peterson, K., Phunkhang, P . , Pierre, N., Raymond, C, Retta,R., 
Rise,C., Rogov, P,, Roman, J., Roy,A. , Schauer,S., Schupback, R. , 
Seaman, S., Severy,P., Smith, C, Spencer, B., Stange-Thomann, N . , 
Stojanovic,N. , Talamas,J., Tesfaye,S., Theodore, J., Topham, K., 
Travers,M., Vassiliev, H . , Viel,R., Vo,A. , Wilson, B., Wu,X., 
Wyman,D., Young, G., Zainoun, J., Zembek,L., Zimmer,A. and Zody,M. 

TITLE Direct Submission 

JOURNAL Submitted (27-M7\R-2003) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
COMMENT On Nov 20, 2002 this sequence version replaced gi: 25046635. 

All repeats were identified using RepeatMasker : 
Smit, A.F.A. & Green, P. (1996-1997) 

http : //ftp . genome .Washington. edu/RM/RepeatMasker. html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence__submissions ©genome . wi .mit . edu 

Project Information 

Center project name: L11589 
Center clone name: 959 M 3 



NOTE: This is a 'working draft* sequence. It currently 
consists of 9 contigs . The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
be preserved. 

1 78629: contig of 78629 bp in length 
78630 78729: gap of 100 bp 

78730 94580: contig of 15851 bp in length 
94581 94680: gap of 100 bp 

94681 112894: contig of 18214 bp in length 
112895 112994: gap of 100 bp 

112995 140439: contig of 27445 bp in length 





140440 


140539: 


gap of 


100 bp 






•A- 


140540 


162280: 


contig 


of 21741 bp in length 


ir 


162281 


162380: 


gap of 


100 bp 






★ 


162381 


166673: 


contig 


of 4293 


bp in 


length 


-A- 


166674 


166773: 


gap of 


100 bp 






4- 


166774 


176335: 


contig 


of 9562 


bp in 


length 


+ 


176336 


176435: 


gap of 


100 bp 






■A- 


176436 


180345: 


contig 


of 3910 


bp in 


length 


■A- 


180346 


180445: 


gap of 


100 bp 






•A- 


180446 


182261: 


contig 


of 1816 


bp in 


length. 



FEATURES 

source 



Location/ Qualifiers 
1. .182261 

/organism="Homo sapiens" 
/mol_type=" genomic DNA" 
/db_xref="taxon:9606" 
/ chromosome="2 " 
/map="2" 

/clone="RPll-959M3" 

/clone lib="RPCI-ll Human Male BAG" 



ORIGIN 



Query Match 27.1%; 
Best Local Similarity 90.0%; 
Matches 824; Conservative 



Score 724; DB 2; Length 182261; 
Pred. No. 8.8e-150; 
0; Mismatches 5; Indels 87; 



Gaps 



1; 



Qy 



Db 



1841 GCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTG 1900 

II I II II I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
6870 GCTGTCTGTCTCCAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTG 6929 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1901 AAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTCA 1960 

I I I I I I I I I I I I I I I I I I II I I I I II I I II I I I II I I I II I I I I I I I I I I I I I I I I I I I I 
.6930 AAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTCA 6989 

1961 CCATCGCGGTCTCAGGAGATAAA 1983 

I I I I II I I I I I I I I II I I I I I I I 

6990 CCATCGCGGTCTCAGGAGATAAAGTAAGCGGGGAAGGCCTCGGGTTCTAAATTATTGGAC 7049 

1984 ATCCTCAGTG 1993 

I M I I I I I I I 

7050 GTCCGGCTTTCCATCCTCCTCATGAGCCCACTGCATGTCTGTGTCTCCAGATCCTCAGTG 7109 

1994 CCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCG 2053 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I II I I I I I I 
7110 CCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCG 7169 

2054 GTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATC7WVCAGAAACCAAGTCAAG 2113 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
7170 GTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGAAACCAAGTCAAG 7229 

2114 ACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACT 2173 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
7230 ACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACT 7289 

2174 GCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCT 2233 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
7290 GCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCT 7349 



Qy 2234 CAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAA 2293 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 7350 CAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGT^ 74 09 

Qy 22 94 TAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAGAAAA 2353 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 7410 TAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAGAAAA 74 69 

Qy 2354 CCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTATAT 2413 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 747 0 CCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTATAT 7529 

Qy 2414 AGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGCAG 2473 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 7530 AGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGCAG 7589 

Qy 2474 GAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCA 2533 

I I I I I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 7590 GAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCA 7649 

Qy 2534 GGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTACCAG 2593 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
Db 7650 GGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTACCAG 7709 

Qy 2594 CAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAT^AGCCAACGTGAACAAT 2653 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 7710 CAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGCCAACGTGAACAAT 77 69 

Qy 2654 T7WWVTGTATTGAGC 2669 

I I I I I I I I I I I I I I I I 
Db 7770 TAAAAATGTATTGAGC 77 85 



RESULT 12 

AC108476 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 



Craniata; Vertebrata; Euteleostomi ; 
Catarrhini; Hominidae; Homo. 



AC108476 139342 bp DNA linear PRI 16-APR-2002 

Homo sapiens BAC clone RP11-1413K20 from 2, complete sequence. 
AC108476 

AC108476.5 GI: 19807988 
HTG. 

Homo s apiens ( human ) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 

1 (bases 1 to 139342) 
Sulston,J.E. and Waterston,R. 
Toward a complete human genome sequence 
Genome Res. 8 (11), 1097-1108 (1998) 
99063792 
9847074 

2 (bases 1 to 139342) 

Harkins,C., Haakenson,W. and Doebber,A. 
The sequence of Homo sapiens BAC clone RP11-1413K20 
Unpublished (2001) 

3 (bases 1 to 139342) 
Waterston, R.H. 
Direct Submission 



JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



Submitted (27- JAN-2002 ) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

4 (bases 1 to 139342) 
Waters ton, R. H . 
Direct Submission 

Submitted (20-FEB-2002 ) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

5 (bases 1 to 139342) 
Waterston, R. H. 
Direct Submission 

Submitted (29-MAR-2002) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

6 (bases 1 to 139342) 
Waterston, R. 

Direct Submission 

Submitted ( 16-APR-2002 ) Department of Genetics, Washington 
University, 4444 Forest Park Avenue, St. Louis, Missouri 63108, USA 
On Mar 29, 2002 this sequence version replaced gi: 18767626. 
Genome Center 

Center: Washington University Genome Sequencing Center 

Center code: WUGSC 

Web site: http://genome.wustl.edu/gsc 
Contact: sapiens@watson.wustl.edu 

Summary Statistics 

Center project name: H NH1413K20 



NOTICE: This sequence may not represent the entire insert of this 
clone. It may be shorter because we only sequence overlapping 
clone sections once, or longer because we provide a small overlap 
between neighboring data submissions. 

This sequence was finished as follows unless otherwise noted: 
all regions were double stranded, sequenced with an alternate 
chemistry, or covered by high quality data (i.e., phred quality >= 
30) ; an attempt was made to resolve all sequencing problems, such 
as compressions and repeats; all regions were covered by sequence 
from more than one subclone; and the assembly was confirmed by 
restriction digest. 

MAPPING INFORMATION: 

Mapping information for this clone was provided by Dr. John D. 
McPherson, Department of Genetics, Washington University, St. Louis 
MO. For additional information about the map position of this 
sequence, see http : //genome , wustl . edu/gsc 

SOURCE INFORMATION: 

The RPCI-11 human BAC library was made from the blood of one male 
donor, as described by Osoegawa,K., Woon,P.Y., Zhao,B., Frengen,E., 
Tateno,M., Catanese, J. J. and de Jong, P.J. (1998) An improved 
approach for construction of bacterial artificial chromosome 
libraries. Genomics 51:1-8. The clone may be obtained either from 
Research Genetics, Inc. (http : //www. resgen , com) or Pieter de Jong 
and coworkers at http://www.chori.org 



VECTOR: pBACe3 , 6 



FEATURES 

source 



inisc_f eature 
misc_feature 
misc_f eature 

mi sc_f eature 
misc_feature 
mi sc_f eature 

misc_f eature 
misc_feature 

repeat_region 
misc_f eature 
repeat_region 
misc_feature 
misc_feature 

misc_feature 
misc_f eature 
misc_f eature 
misc feature 



misc feature 



NEIGHBORING SEQUENCE INFORMATION: 

The clone sequenced to the left is RP11-489K22, 2000 bp overlap. 
Actual end is at base position 139342 of RP11-1413K20 . 

The region between 132012 and 132017 is covered only by a per 
product of clone DNA. 

Location/Qualifiers 

1. .139342 

/organism="Homo sapiens" 
/mol_type="genomic DNA" 
/db_xref="taxon:9606" 
/chromosome="2 " 
/map="2" 

/clone="RPll-1413K20" 
/clone_lib="RPCI-ll" 
55. .655 

/note="match to EST AA203458 (NID: gl799169 ) zx58b04.rl" 
93. .286 

/note="match to EST AV689089 (NID : gl0290952 ) " 
93. .286 

/note="similar to Mus musculus EST AI597378 (NID: g4606426) 
vj29c06.yl" 
93. .279 

/note="match to EST AV660973 (NID: g9881987 ) " 
318. .653 

/note="match to EST R00405 (NID : g750141 ) ye71e05.rl" 
372, .633 

/note="similar to Homo sapiens EST T97887 (NID : g747232 ) 
ye58h05.rl" 
706. .708 

/note="match to EST R00405 (NID : g750141 ) ye71e05.rl" 
706. .707 

/note="similar to Homo sapiens EST T97887 (NID : g747232 ) 
ye58h05.rl" 
847. .1139 
/rpt_family="Alu" 
1867. .2047 

/note="match to EST T39945 (NID : g647612 ) yal3g04.rl" 
2234. .2616 
/rpt_family-"L2" 
2983. .3121 

/note="match to EST AV689089 (NID: gl0290952 ) " 
2983. .3121 

/note="similar to Mus musculus EST AI597378 (NID : g4606426) 
vj29c06.yl" 
3044. .3121 

/note="match to EST T86384 (NID : g714736) yd77b08.rl" 
4099. .4304 

/note="match to EST T86384 (NID : g714736 ) yd77b08.rl" 
4099. .4283 

/note="match to EST AV689089 (NID: gl0290952 ) " 
4401. .4618 

/note="similar to Mus musculus EST BF162656 
(NID:gll042879) " 
4405. .4454 



/note="match to EST T86384 (NID : g714736) yd77b08.rl" 
misc_feature 4724. .5110 

/note="similar to Homo sapiens EST AV656623 
(NID:g9877637) " 
misc_feature 5075. .5204 

/note="similar to Mus musculus EST BF162656 
(NID:gll042879) " 
repeat_region 5495. .5657 

/ rp t_f ami 1 y= "MI R" 
repeat_region 5673. .5767 

/ rp t_f ami 1 y= "MI R " 
repeat_region 5774. .5813 

/rpt_family=" (TTG)n" 
repeat_region 5816, .5958 

/ rp t_f ami 1 y= " Alu " 
repeat_region 5976. .6091 

/rpt_family="MIR" 
repeat_region 6162. .6485 

/ rp t_f ami 1 y= " Alu " 
misc_feature 6351. .6373 

/note="match to EST AA228345 (NID : gl849916) nc39d04.sl" 
misc_feature 6352. .6364 

/note="match to EST AI431309 (NID : g4302284 ) ar55b01.xl" 
misc_feature 6352. .6364 

/note="match to EST AI469772 (NID : g4331862 ) tm20fll.xl" 
misc_feature 6353, ,6367 

/note="match to EST AI241685 (NID : g3837082 ) qu70f06.xl" 
misc_feature 6568. .6707 

/note="similar to Mus musculus EST BF162656 

(NID:gll042879) " 
misc_feature 6649. .6707 

/note="similar to Mus musculus EST BB598373 

(NID:gl6450340) " 
repeat_region 7229. .7528 

/rpt_family="Alu" 
misc_feature 7940. .8549 

/note="similar to EST BM725726 (NID: gl9047059) " 
misc_feature 8169. .8305 

/note^"similar to Mus musculus EST BF162656 

(NID:gll042879) " 
misc_feature 8169. .8301 

/note="similar to Mus musculus EST BB598373 

(NID:gl6450340) " 
repeat_region 8500. .8529 

/ rp t_f ami 1 y= " AT_r i ch " 
repeat_region 8540. .8868 

/ rpt_f amily= "Alu " 
repeat__region 8870. .9180 

/rpt_family-"Alu" 
repeat_region 10493. .10636 

/ rp t__f ami ly= "MI R" 
repeatregion 11195. .11376 

/rpt_family="MERl_type" 
repeat_region 11377. .11658 

/ rp t_f amil y= "Alu " 
repeat_region 11659. .11799 

/rpt_family-"MERl_type" 



misc_feature 11955. .12053 

/note="similar to Mus musculus EST BB598373 

(NID:gl6450340) " 
misc_feature 11994. .12053 

/note="similar to Mus musculus EST AA239884 (NID : gl863923 ) 

mxSldOl.rl" 
repeat_region 12086. .12109 

Query Match 27.1%; Score 722.4; DB 9; Length 139342; 

Best Local Similarity 89.8%; Pred. No. 2e-149; 

Matches 823; Conservative 0; Mismatches 6; Indels 87; Gaps 1; 

Qy 1841 GCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTG 1900 

II I I II I I I I I II I I II I I I I I I I I I I I I I II I I II I I I I I I M I I II I I I M II 
Db 57732 GCTGTCTGTCTCCAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTG 57791 

Qy 1901 AAGGGCTGATGAAGATTCAGTTCAGCAGAAG7UVCTTATAAAATGCCTCTCGGGAACCTCA 1960 

I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I II M I I I I I I M I I I I I II I I I 
Db 57792 AAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTCA 57851 

Qy 1961 CCATCGCGGTCTCAGGAGATAAA 1983 

I I I I I I II I I I M I I I I I I II II 
Db 57852 CCATCGCGGTCTCAGGAGATAAAGTAAGCGGGGAAGGCCTCGGGTTCTAAATTATTGGAC 57 911 

Qy 1984 ATCCTCAGTG 1993 

I I I I I I I II I 

Db 57912 GTCCGGCTTTCCATCCTCCTCATGAGCCCACTGCATGTCTGTGTCTCCAGATCCTCAGTG 57971 

Qy 1994 CCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCG 2 053 

I I II I M I I I I I I I I M I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 

Db 57972 TCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCG 58031 

Qy 2054 GTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGAAACCAAGTCAAG 2113 

I I I I I I I I I I I I I I I I I I I I I I I M I II I I I I M I I I II I I I I I II I I I I I I I I I I I I I I 
Db 58032 GTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATC7\AACAGAAACCAAGTCAAG 58091 

Qy 2114 ACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACT 2173 

I I I I I M I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I 
Db 58092 ACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACT 58151 

Qy 2174 GCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCT 2233 

I I I I I I I I I I I I I I I I I I I I I I I II I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 58152 GCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCT 58211 

Qy 2234 CAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAA 2293 

I I M I I II I I I I I I I I I I I I I I I I II II I I I I I I I I I I II I I I I I I I I M I II I I I I I I I 
Db 58212 CAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAA 58271 

Qy 2294 TAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAGAAAA 2353 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I II I I I I 

Db 58272 TAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAGAAAA 58331 

Qy 2354 CCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTATAT 2413 

I I I II I I M I I I I M I I II I II I I I I I M I I II I I II I I I I I I I I I I II M I I I I I I I I I 
Db 58332 CCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTATAT 58391 

Qy 2414 AGGCAACTCGATATAGGATGGGAGC7W\.CTAGGAATGAATTGGGTAGCTAGACTGTGCAG 2473 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I M I 

Db 58392 AGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGCAG 58451 



Qy 2474 GAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCA 2533 

I I I I I I I I M I I M I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 58452 GAATTGTTGGAACCTGGAGGGAACAATT^ACAGTAGCTAGCAGATTTGGCTTCATCTTCCA 58511 

Qy 2534 GGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAG7\AAGTGACCTAAGATGTACCAG 2593 

I I I I I I M I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 58512 GGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTACCAG 58571 

Qy 2594 CAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGCCAACGTGAACAAT 2653 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 58572 CAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGCC7UVCGTGAACAAT 58631 

Qy 2654 TAAAAATGTATTGAGC 2669 

I I I I I I I I I I I I I I I I 
Db 58632 TAAAAATGTATTGAGC 58647 



RESULT 13 

F351812S13 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SEGMENT 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 



JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



F351812S13 2201 bp DNA linear PRI lO-AUG-2001 

Homo sapiens sterolin-2 (ABCG8) gene, exon 13 and complete cds . 
AF351824 

AF351824.1 GI: 15146443 
13 of 13 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 2201) 

Lu,K., Lee^M.H., Hazard, S., Brooks-Wilson, A. , Hidaka,H,, Kojima,H,, 
Ose,L., Stalenhoef , A. F. , Mietinnen, T . , Bjorkhem,!., Bruckert,E., 
Pandya,A. , Brewer, H.B. Jr., Salen,G., Dean,M., Srivastava, A. and 
Patel, S.B. 

Two genes that map to the STSL locus cause sitosterolemia : genomic 

structure and spectrum of mutations involving sterolin-1 and 

sterolin-2, encoded by ABCG5 and ABCG8, respectively 

Am. J. Hum. Genet. 69 (2), 278-290 (2001) 

21344600 

11452359 

2 (bases 1 to 2201) 
Lu,K. 

Direct Submission 

Submitted (21-FEB-2001 ) Division of Endocrinology, Diabetes and 
Medical Genetics, Medical University of South Carolina, 114 Doughty 
St, STB 541, Charleston, SC 29403, USA 

Location/ Qualifiers 

1. .2201 

/organism="Homo sapiens" 
/mol_type=" genomic DNA" 
/db_xref="taxon:9606" 
/ chr omo s ome= " 2 " 

/map="between D2S2294 and D2S2298" 
/clone-"108lG2; 32814" 



gene 



mRNA 



.163, 



CDS 



.163, 



/cell_type-"ES cell" 

order {AF3518 12. 1:<1227. . 2 809, AF351813 . 1 : 1 . .4 665, 
AF351814, 1: 1. . 2368, AF351815 . 1 : 1 . . 1323, AF351816 . 1 
.660,AF351818.1:1. 
.685,AF351821.1:1. 
.859,1. .>182) 



.203,AF351819. 
.884,AF351822. 



AF351817.1:1. 
AF351820.1:1. 
AF351823. 1:1. 
/gene="ABCG8" 

join(AF351812. 1:<1227. . 1289, AF351813 . 1 : 3941 . 
AF3518 14. 1:924. . 1080, AF351815 . 1 : 674 . .912, 
AF3518 16. 1:121. . 253, AF351817 . 1 : 66 . . 335, AF351818 . 1 : 1 . 



, .300, 
,888, 
,1292, 



.4042, 



AF351819. 1:48. . 128, AF351820 . 1 : 310 . .509, 
AF351821. 1:243. . 319, AF351822 . 1 : 101 . .368, 
AF351823. 1:689. .816,45. .>182) 
/gene="ABCG8" 
/product="sterolin-2" 

join (AF351812. 1:1227. . 1289, AF351813 . 1 : 3941 . .4042, 
AF3518 14. 1:924. . 108 0, AF351815 . 1 : 674 . .912, 
AF351816. 1: 121. . 253, AF351817 . 1: 66. . 335 , AF3518 18 . 1 : 



1. 



AF351819. 1:48. . 128, AF351820 . 1 : 310 . .509, 

AF351821. 1:243. . 319 , AF351822 . 1 : 101 . .368, 

AF351823. 1:689. ,816,45. .182) 

/gene="ABCG8" 

/note="HABCG8" 

/codon_start=l 

/product="sterolin-2" 

/protein_id="AAK8 4 663. 1" 

/db_xref="GI : 15146444" 

/translation="MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQP 

NTLEVRDLNCQVDLASQVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLA 

IIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPN 

LTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGER 

RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDI 

FRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQ 

EIATREKAQSIJE\ALFLEKVRDLDDFLWKAETKDLDEDTCVESVTPLDTNCLPSPTK^ 

GAVQQFKTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAAL 

LFMIGALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYII 

lYGMPTYWIJU^LRPGLQPFLLHFLLVWLWFCCRIMALAAAALLPTFHMASFFSNALY 

NSFYLAGGFMINLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVS 

GDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYVSLRFIKQKPSQDW" 

45. .>182 

/gene="ABCG8" 

/ nuinber=13 



ORIGIN 



Query Match 24.9%; Score 663.6; DB 9; Length 2201; 

Best Local Similarity 99.3%; Pred. No. 2.7e-136; 

Matches 685; Conservative 2; Mismatches 1; Indels 2; Gaps 2; 

Qy 1982 AAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCA 2041 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
Db 43 AGATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCA 102 



Qy 

Db 



2 042 TTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGA 2101 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
103 TTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGA 162 



Qy 2102 AACCAAGTCAAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCA 2161 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I M M M I M I I I I I I I I I I 

Db 163 AACCAAGTCAAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCA 222 

Qy 2162 GACCCTTCAACTGCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGAC 2221 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 
Db 223 GACCCTTCAACTGCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGAC7UVTGAC 282 

Qy 2222 CCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGG 2281 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I 
Db 283 CCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTGCRGTGGCACAGACCAGCCACAGG 342 

Qy 2282 ATGGCAGTAGAATAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATG 2341 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I M 
Db 343 ATGGCAGTAG7\ATAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATG 402 

Qy 2342 ACTGGGAGAAAACCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGAT 2401 

I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4 03 ACTGGGAGAAAACCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGAT 4 62 

Qy 24 02 ATGCATTTATATAGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGC 24 61 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 4 63 ATGCATTTATATAGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGC 522 

Qy 2462 TAGACTGTGCAGGAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGG 2521 

I I I I I I I I I I I M I I I I I I I I I I I I I M I M I I I I I I M I I I I I I I : I I I I I I I I I I I I I 

Db 523 TAGACTGTGCAGGAATTGTTGGAACCTGGAGGGAACAATAACAGTASCTAGCAGATTTGG 582 

Qy 2522 CTTCATCTTCCAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCT 2581 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I 
Db 583 CTTCATCTTCCAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCT 642 

Qy 2582 -AAGATGTACCAGCAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGC 2640 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I 
Db 64 3 NAAGATGTACCAGCT^^GATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGC 7 02 

Qy 2641 CAACGTGAACAA-TTAAAAATGTATTGAGC 2669 

I I I I I I I I I M I I I I I I I I I I I I I I I I I I 

Db 703 CAACGTGAACAANTTAAAAATGTATTGAGC 732 



RESULT 14 

AC146466 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AC146466 185045 bp DNA linear HTG 15-AUG-2003 

Callithrix jacchus clone CH259-274K20, WORKING DRAFT SEQUENCE, 3 
ordered pieces . 
AC146466 

AC1464 66. 1 GI: 33667132 

HTG; HTGS_PHASE2; HTGS_DRAFT. 

Callithrix jacchus (white-tufted-ear marmoset) 
Callithrix jacchus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Platyrrhini; Callitrichidae; 
Callithrix. 

1 (bases 1 to 185045) 

Cheng, J. -F., Hamilton, M., Peng,Y., Mukher j ee, S . , Hosseini,R., 



TITLE 
JOURNAL 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 

COMMENT 



Peng,Z., Malinov,I. and Rubin, E.M. 

Direct Submission 

Unpublished 

2 (bases 1 to 185045) 

Cheng, J. -F., Hamilton, M. , Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng,Z., Malinov, I. and Rubin, E.M. 
Direct Submission 

Submitted ( 15-AUG-2003) Genome Sciences, Lawrence Berkeley National 
Laboratory, 1 Cyclotron Rd., Berkeley, CA 94720, USA 

Sequence Produced by Berkeley PGA 
Web site: http://pga.lbl.gov 
Center Code: PGABERK 
Center Project Name: J027 
Bac Clone Name: CH259-274K20 

This sequence has been compared to sequences of other species 
using Vista (http://www-gsd.lbl.gov/VISTA). The results can be 
viewed at: 

http : //pga . Ibl . gov/cgi-bin/search_cvcgd?type=n&value=ABCG5 

The order-orientation of the draft sequence was accomplished by 
using ; 

Avid (http://baboon.math.berkeley.edu/mavid) , 

Lagan (http://lagan.stanford.edu/) and paired end information. 
Funding agent: Programs for Genomic Applications (NHLBI) 



It currently 



FEATURES 

source 



ORIGIN 



Summary Statistics: 
Sequencing vector: Plasmid; pUC18 
Chemistry: Dye-terminator Big Dye 
Assembly program: Phrap version 0.990329. 

* NOTE: This is a 'working draft* sequence. 

* consists of 3 contigs. Gaps between the contigs 

* are represented as runs of N. The order of the pieces 

* is believed to be correct as given, however the sizes 

* of the gaps between them are based on estimates that have 

* provided by the submitter. 

* This sequence will be replaced 

* by the finished sequence as soon as it is available and 

* the accession number will be preserved. 

* 1 49109: contig of 49109 bp in length 

* 49110 49209: gap of unknown length 

* 49210 57420: contig of 8211 bp in length 

* 57421 57520: gap of unknown length 

* 57521 185045: contig of 127525 bp in length. 

Location/Qualifiers 
1. .185045 

/organism="Callithrix jacchus" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 9483" 
/clone="CH259-274K20" 



Query Match 21.8%; Score 581.2; DB 2 

Best Local Similarity 83.0%; Pred. No. 4.1e-118 
Matches 762; Conservative 0; Mismatches 63 



Length 185045; 

Indels 93; Gaps 5; 



Qy 1841 GCAGCCTGTGGACAGTGCCCGCGTGGATTTCC7WVGTGTCCTTCCTGCGGTGGTGTTTTG 1900 

II I II I I I I I I II II I I II II I I I I I I I I M I I I I I M I II II II I I I I I I II 
Db 139563 GCTCTCTGTCTCCAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGATGGTGTTTTG 

139622 

Qy 1901 7\AGGGCTGATGAAGATTCAGTTCAGCAG7\AG7\ACTTATAAAATGCCTCTCGGGAACCTCA 1960 

II I I I I I I I I I I I I II I I II I I II I I II III I I I I I I I I I I I I I I I I II III III 

Db 139623 AAGGGCTGATGAAGATTCAGTTCAGCAGCAGAGCTTATAAAATGCCTCTGGGCAACTTCA 

139682 

Qy 1961 CCATCGCGGTCTCAGGAGATAAA 1983 

I I I I I I III I I I I I I I II I I 
Db 139683 CCATCCCAGTCCCAGGAGATAAAGTAAGCGGGGT^AGGCCTCAGGTTCTAAATGACTGGAT 

139742 

Qy 1984 ATCCTCAGTG 1993 

I I I I I I I I I I 

Db 139743 GTCCGGCTGCCTGTCCTCCTAATGAGCCCACTGCATGTCTGTGTCTCCAGATCCTCAGTG 

139802 

Qy 1994 CCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCG 2053 

I II II I I I I I I I I I I I II I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1398 03 CCATGGAGCTGAACTCGTACCCTCTCTACGTCATCTACCTCATCGTCATTGGCCTCAGCG 

139862 

Qy 2054 GTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGAAACCAAGTCAAG 2113 

II II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 1398 63 GTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGAAACCAAGTCAAG 
139922 

Qy 2114 ACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACT 2173 

I I I M I I II I II III I I I I II II II I I I I I I I I III I I I I I I I I II I I II I I 
Db 139923 ACTGGTGATTCATGCCGGGTGCCTGCCCGCTGGTGGGGCACCCAAGCAGACCCTTCAACT 

139982 

Qy 2174 GCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCT 2233 

I I I I I M I I M I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II III III 

Db 139983 GCACTCCTTCCTCAGAAGCCCCTTCCTGGGGACAATGAGGACAATGACCCTA-AGAAGCT 

140041 

Qy 2234 CAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAA 2293 

II I I I I I I I I I I II I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I 

Db 140042 CAGCTACATCCGGCCCACAGTGCTGCAGTGGCACAGACCAACCACAGGATGGCAGTAGAA 

140101 

Qy 2294 TAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTG — CGATGACTGGGAGAA 2351 

I I I I I I I I I I M M I II M I M I I I I I I I I I I I I I I I I II II II I II I I I 

Db 140102 TAAAGACAGTAGAAAGGGATTTCTGCTCACTGGCAGGAGACTGATGACTGGGAGTGAGAA 

140161 

Qy 2352 AACCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTAT 2411 

I I I I I I I I I I I I I I M I I I M I M I I I I I I I I I I I I I I I I I I II I M I I I I I I II 
Db 140162 AACCTGCACTCAGTGGCGCCTACAACGTTGCTAATTTATTTCCTTTTGATATGTGCTTAT 

140221 



Qy 2412 ATAGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGC 2471 



I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I III 

Db 140222 ATAGGCAACTCGATATAGGATGGGAGCAAACTAGGTUVTAAATTGAGTAGCTAGAC — TGC 

140279 

Qy 2472 AGGAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTC 2531 

I I I I I I II II I I I II I I I I I I I I I I I II I I I I I III 

Db 140280 AGGAATTGTTAGAACCCAGAGAGAATAATAAAAGTAGCTAGCAGATCTGGCCTCATCTTC 

140339 

Qy 2532 CAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTACC 2591 

I I M M I I I I I I MM I II II I M II II II I M M II M M II I I I I 

Db 140340 CAGGGGCCCCACACTTAGTGG-GAGCTATCATCAATACAGAAAGTGATCTAAGATGTACC 

140398 

Qy 2592 AGCAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGCCAACGTGAACA 2651 

II II M II I II I I I I II I I M II I II II I II II II II I M I I I I 

Db 140399 AGCAAGATGCCACCCCTTCTTTTCGTGTGGGGTCATGGGCTCCAAAAGCCAACGTGAACA 

140458 

Qy 2652 ATTAAAAATGTATTGAGC 2669 

II II II II I II II M II 
Db 140459 ATTAAAAATTTATTGAGC 140476 



RESULT 15 

AC146464/C 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNTUi 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 

REFERENCE 
AUTHORS 

TITLE 
JOURNAL 

COMMENT 



AC146464 202533 bp DNA linear HTG 19-AUG-2003 

Saimiri sciureus clone CH254-84A11, WORKING DRAFT SEQUENCE. 
AC146464 

AC146464 .1 GI : 33636782 

HTG ; HTGS_PHASE2 ; HTGS_DRAFT . 

Saimiri sciureus (common squirrel monkey) 

Saimiri sciureus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Platyrrhini; Cebidae; Cebinae; 
Saimiri. 

1 (bases 1 to 202533) 

Cheng, J. -F., Hamilton, M., Peng^Y., Mukher j ee, S . , Hosseini,R., 
Peng,Z., Malinov, I. and Rubin, E.M. 
Direct Submission 
Unpublished 

2 (bases 1 to 202533) 

Cheng, J. -F., Hamilton, M., Peng,Y., Mukher jee, S . , Hosseini,R., 
Peng,Z., Malinov, I . and Rubin, E.M. 
Direct Submission 

Submitted ( 14-AUG-2003 ) Genome Sciences, Lawrence Berkeley National 
Laboratory, 1 Cyclotron Rd., Berkeley, CA 94720, USA 

3 (bases 1 to 202533) 

Cheng, J. -F., Hamilton, M., Peng,Y., Mukher j ee, S . , Hosseini,R., 
Peng,Z,, Malinov,!. and Rubin, E.M. 
Direct Submission 

Submitted ( 19-AUG-2003 ) Genome Sciences, Lawrence Berkeley National 
Laboratory, 1 Cyclotron Rd., Berkeley, CA 94720, USA 

Sequence Produced by Berkeley PGA 
Web site: http://pga.lbl.gov 
Center Code: PGABERK 



Center Project Name: S030 
Bac Clone Name: CH254-84A11 



This sequence has been compared to sequences of other species 
using Vista (http://www-gsd.lbl.gov/VISTA). The results can be 
viewed at: 

http : //pga . Ibl . gov/ cgi-bin/search_cvcgd?type=n&value=ABCG5 

The order-orientation of the draft sequence was accomplished by 
using : 

Avid (http://baboon.math.berkeley.edu/mavid) , 

Lagan (http://lagan.stanford.edu/) and paired end information. 

Funding agent: Programs for Genomic Applications (NHLBI) 

Summary Statistics: 
Sequencing vector: Plasmid; pUC18 
Chemistry: Dye-terminator Big Dye 
Assembly program: Phrap version 0.990329. 

* NOTE: This is a 'working draft' sequence. It currently 

* consists of 1 contigs . Gaps between the contigs 

* are represented as runs of N. The order of the pieces 

* is believed to be correct as given, however the sizes 

* of the gaps between them are based on estimates that have 

* provided by the submittor. 

* This sequence will be replaced 

* by the finished sequence as soon as it is available and 

* the accession number will be preserved. 

* 1 202533: contig of 202533 bp in length. 
FEATURES Location/Qualifiers 

source 1. .202533 

/organism="Saimiri sciureus" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 9521" 
/clone="CH254-84All" 

ORIGIN 

Query Match 21.6%; Score 576.4; DB 2; Length 202533; 

Best Local Similarity 82.7%; Pred. No. 4.7e-117; 

Matches 759; Conservative 0; Mismatches 66; Indels 93; Gaps 5; 

Qy 1841 GCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTG 1900 

II I II I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I II I II II I I I II I I 
Db 10147 GCTCTCTGTCTCCAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTG 10088 

Qy 1901 AAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTCA 1960 

I I II I I I I I I I I I I I I I I I I I I I II I I I III I I I I I I I I I I I I I I I I II III III 
Db 10087 AAGGGCTGATGAAGATTCAGTTCAGCAGCAGAGCTTATAAAATGCCTCTGGGCAACTTCA 10028 

Qy 1961 CCATCGCGGTCTCAGGAGATAAA 1983 

I II I I I I I I I I I I I I I I I I I I 
Db 10027 CCATCCCGGTCCCAGGAGATAAAGTAAGCGGGGAAGGCCTCAGGTTCTAAATGACTGGAT 9968 

Qy 1984 ATCCTCAGTG 1993 

I I I I I I I I I I 

Db 9967 GTCCGGCTGCCTGTCCTCCTAACGAGCCCGCTGCATGTCTGTGTCTCCAGATCCTCAGTG 9908 



Qy 1994 CCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCG 2053 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 
Db 9907 CCATGGAGCTGAACTCGTACCCTCTCTACGCCATCTACCTTATCGTCATTGGTCTCAGCG 984 8 

Qy 2054 GTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAG7WVCCAAGTCAAG 2113 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I 

Db 98 47 GTGGCTTCATGGTCCTGTACTACATGTCCTTT^GGTTCATCTWVCAGTWVCCiQAGTCAGG 97 88 

Qy 2114 ACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACT 2173 

I I I I I I I I I I I I III I II I II I I I I I I I II I I I III I I I I I I I I I I I M I II I 

Db 97 87 ACTGGTGATTCATGCCGGGCGCCTGCCCACTGGTGGGGAACCCGAGCAGACCCTTCAACT 9728 

Qy 2174 GCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCT 2233 

II I II I I I I I I I I I I I I I I I I I I I I I I I I M I I II I II I I I I I I I I I I Ml III 

Db 9727 GCACTTTTTCCTCAGGAGCCCCTTCCTGGGGACAATGAGGACAATGACCCTA-AGAAGCT 9669 

Qy 2234 CAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAA 2293 

I I I I I I I M I I I I I I I I I I I I I I I I II I I II M I II I I I I I I I I I I I I I I I I I II I I 
Db 9668 CAGCTACATCCGGCCCACAGTGCTGCAGTGGCACAGGCCAGCCACAGGATGGCAGTAGAA 9609 

Qy 22 94 TAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTG — CGATGACTGGGAGAA 2351 

I I I I I I I I I I I I I I I I I I I II I I I I I II I I M II I I I M I I I ' II I I I I I I 
Db 9608 TAAAGACAGTCGAAAGGGATTTCTGCTCTCTGGCAGGAGACTGATGACTGGGAGTGAGAA 9549 

Qy 2352 AACCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTAT 2411 

I I I M I I I M I II I I I M I I M I I II II I I I I I I I I I I I I I M I I I I I I I I I I I 

Db .9548 AACCTGCACTCAGTGGCGCCCACAACGTTGCTAATTTATTTCCTTTTGATATGTGCTTAT 9489 

Qy 2412 ATAGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGC 2471 

I I I I II I I II I I I I I I I I I II I I I I I I II I II I I I I I I I I I I I I I I I I I I I II III 

Db 9488 ATAGGCAACTCGATATAGGATGGGAGCAAACTAGGAATAAATTGAGTAGCTAGAC — TGC 9431 

Qy 24 72 AGGAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTC 2531 

II I I I I I I I I I I I I I III I I I I I II I I I I I I M I I I I I I I I I II I I I I I I I I 

Db 9430 AGGAATTGTTAGAACCCAGAGAGAACAATAACAGTAGCTAGCAGATCTGATCTCATCTTC 9371 

Qy 2532 CAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTACC 2591 

II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 9370 CAGGGGCCCCACACTTCGTGG-GAGCCATCATCAATACAGAACGTGACCTAAGATGTACC 9312 

Qy 2592 AGCAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGCCAACGTGAACA 2651 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I 
Db 9311 AGCAAGATGCCACCCCTTCTGTTCGTGTGGGGTAATGGGCTCCAAAAGCCAACGTGAACA 9252 

Qy 2652 ATTAAAAATGTATTGAGC 2669 

I I I I I I I II I I I I I II 
Db 9251 ATTAAAAATTTATTGGGC 9234 



Search completed: February 26, 2004, 06:21:30 
Job time : 6869.55 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on: February 26, 2004, 00:39:18 



; Search time 675.985 Seconds 
(without alignments) 
16773.223 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



US-09-989-981A-7 
2669 

1 gtgtccctgctccaggaaac caattaaaaatgtattgagc 2669 

IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 



Searched: 3373863 seqs, 2124099041 residues 

Total number of hits satisfying chosen parameters; 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



6747726 



Database 



N__Geneseq_29 Jan04 : 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10 



geneseqnl980s : * 
geneseqnl990s : * 
geneseqn2000s : * 
geneseqn2001as : * 
geneseqn2001bs : * 
geneseqn2002s : * 
geneseqn2003as : * 
geneseqn2003bs : * 
geneseqn2003cs : * 
geneseqn2004s : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 



Result 



Query 



^o. 


Score 


Match 


Length 


DB 


ID 


1 


2669 


100. 


0 


2669 


7 


AAD48883 


2 


1680.6 


63. 


0 


3239 


6 


ABK83218 


3 


1499 


56. 


2 


2564 


6 


7VBN90022 


4 


1430 


53. 


6 


2019 


7 


Ai\D48881 


5 


291.6 


10. 


9 


580 


4 


AAH98911 


6 


203.6 


7. 


6 


1920 


6 


7VBK51681 


7 


203.6 


7. 


6 


2340 


6 


AAD22009 



Description 



Aad4 8883 Human ABC 
Abk:83218 Human tra 
Abn90022 Mouse clo 
Aad48881 Mouse ABC 
Aah98911 Arabidops 
Abk51681 DNA encod 
Aad22009 Human sit 



8 


203.6 


7. 


6 


2340 


7 


AAD48882 


Aad48882 


Human ABC 


9 


203.6 


7 . 


6 


2516 


6 


7VBK51682 


Abk51682 


Human ABC 


10 


194 .4 


7 . 


3 


2035 


6 


ABK51686 


Abk51686 


cDNA enco 


11 


193.4 


7. 


2 


1915 


6 


ABK51684 


7yDk51684 


DNA encod 


12 


193.4 


7 . 


2 


1959 


7 


AAD48880 


Aad48880 


Mouse ABC 


13 


193.4 


7. 


2 


2258 


6 


AAD22008 


Aad22008 


Mouse sit 


14 


193.4 


7. 


2 


2354 


6 


ABK51685 


7\bk51685 


Mouse ABC 


15 


173.2 


6. 


5 


1069 


6 


ABK51687 


Abk51687 


cDNA enco 


16 


172 


6. 


4 


363 


6 


ABN16253 


Abnl6253 


Human ORF 


17 


161.4 


6. 


0 


5460 


6 


ABK51683 


Abk51683 


Human ABC 


18 


155.8 


5. 


8 


2525 


3 


AAZ98625 


Aaz98625 


Silkworm 


19 


122 


4. 


6 


2025 


6 


ABA94371 


Aba94371 


Murine BC 


20 


121 


4. 


5 


2446 


3 


AAC37975 


Aac37975 


Arabidops 


21 


115.4 


4. 


3 


1968 


6 


AAL42412 


Aal42412 


Human BCR 


22 


115.4 


4. 


3 


1968 


9 


ADC54181 


Adc54181 


Human bre 


23 


115.4 


4. 


3 


1998 


6 


AAL42414 


Aal42414 


Human BCR 


24 


115.4 


4. 


3 


2027 


6 


ABK49901 


Abk49901 


cDNA enco 


25 


115.4 


4. 


3 


2053 


6 


ABK49911 


Abk49911 


cDNA enco 


26 


115.4 


4. 


3 


2247 


6 


ABA94383 


Aba94383 


Human BCR 


27 


115.4 


4. 


3 


2418 


2 


AAZ06360 


Aaz06360 


Breast Ca 


28 


115.4 


4. 


3 


2574 


4 


AAF27724 


Aaf27724 


Human tra 


29 


115.4 


4. 


3 


2574 


8 


ADA10916 


Adal0916 


Human cDN 


30 


115.4 


4. 


3 


2718 


7 


ACC80605 


Acc80605 


Human 7VBC 


31 


115.4 


4. 


3 


2719 


3 


AAZ94760 


Aaz94760 


Human ATP 


32 


115.4 


4. 


3 


2719 


3 


AAA27938 


Aaa27938 


ATP-bindi 


33 


115.4 


4. 


3 


2719 


6 


ABA94369 


Aba94369 


Human BCR 


34 


115.4 


4. 


3 


2883 


6 


ABZ35528 


Abz35528 


Human gen 


35 


115 


4. 


3 


2352 


4 


ABL05135 


Abl05135 


Drosophil 


36 


113.8 


4. 


3 


1998 


6 


AAL42413 


Aal42413 


Human BCR 


37 


112.6 


4. 


2 


4646 


7 


ADA68676 


Ada68676 


Spirodela 


38 


109 


4. 


1 


2512 


9 


ADB62671 


Adb62671 


Human cDN 


39 


106.8 


4. 


0 


3201 


6 


ABV74352 


Abv74352 


Human ABC 


40 


105.2 


3. 


9 


2115 


4 


ABL07415 


Abl07415 


Drosophil 


41 


105.2 


3. 


9 


2930 


3 


AAZ94747 


Aaz94747 


Human ATP 


42 


105.2 


3, 


9 


2930 


6 


ABL63321 


Abl63321 


Breast ca 


43 


105 


3. 


9 


727 


4 


AAH07859 


Aah07859 


Human cDN 


44 


105 


3. 


9 


2077 


4 


AAH15008 


Aahl5008 


Human cDN 


45 


101.4 


3. 


8 


447 


7 


ABX46484 


Abx46484 


Bovine ES 



ALIGNMENTS 



RESULT 1 
AAD48883 

ID AAD48883 standard; DNA; 2669 BP. 
XX 

AC AAD48883; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Human ABCG8 DNA. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolaemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW human; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 



KW 
XX 
OS 
XX 
FH 
FT 
FT 
FT 



ABCG5; gene; ds . 



Homo sapiens. 



Key 
CDS 



Location/ Qualifiers 
100. .2121 
/*tag= a 

/product== "hABCGS protein 



XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 2Q-NOV-2001; 2 001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR P-PSDB; AAE31705. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyper lipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 13; Page 80; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is human ABCG8 DNA 
XX 

SQ Sequence 2669 BP; 595 A; 768 C; 722 G; 584 T; 0 U; 0 Other; 

Query Match 100.0%; Score 2669; DB 7; Length 2669; 
Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2669; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 GTGTCCCTGCTCCAGGAT^ACAGAGTGAAGACACTGGCCCTGGCAGGCAGCAGCTGGGTCT 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 GTGTCCCTGCTCCAGGAAACAGAGTGAAGACACTGGCCCTGGCAGGCAGCAGCTGGGTCT 60 

Qy 61 AAGAGAGCTGCAGCCCAGGGTCACAGACCTGTGGGCCCCATGGCCGGGAAGGCGGCAGAG 120 

I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 61 AAGAGAGCTGCAGCCCAGGGTCACAGACCTGTGGGCCCCATGGCCGGGAAGGCGGCAGAG 120 



Qy 



121 GAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATACCTCGGGCCTCCAGGATAGATTG 180 



121 GAGAGAGGGCTGCCGTVAAGGGGCCACTCCCCAGGATACCTCGGGCCTCCAGGATAGATTG 180 

181 TTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTACAGTGGCCAGCCCAACACCCTG 240 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I I M I I I I I I I 
181 TTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTACAGTGGCCAGCCCAACACCCTG 24 0 

241 GAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTCTCAGGTCCCTTGGTTTGAGCAG 300 

I I I I I I M I I I I M I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
241 GAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTCTCAGGTCCCTTGGTTTGAGCAG 300 

301 CTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTGCCAGAATTCTTGTGAGCTGGGC 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
301 CTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTGCCAGAATTCTTGTGAGCTGGGC 360 

361 ATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAGCTCA 420 

I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I M I I I I I I I I I I I I M I I I M I I I I I I I 

361 ATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAGCTCA 420 

421 GGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAGGTCACGGCGGCAAGATC 480 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
421 GGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAGGTCACGGCGGCAAGATC 480 

481 AAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGTGAGGAAGTGT 54 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
481 AAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGTGAGGAAGTGT 540 

541 GTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTG 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

541 GTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTG 600 

601 GCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGG 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
601 GCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACA7WVGG 660 

661 GTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAAC 72 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
661 GTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAAC 72 0 

721 ATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTC 78 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
721 ATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTC 78 0 

781 CTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACA 840 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
781 CTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACA 84 0 

841 GCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATC 900 

M I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 
841 GCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGC7\ACCGGCTGGTGCTCATC 900 

901 TCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACG 960 

I I I I I I I I I I I I I I Ml I I M I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
901 TCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACG 960 

961 TCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATC 1020 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 961 TCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATC 1020 

Qy 1021 GGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATT 1080 

I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1021 GGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATT 1080 

Qy 1081 GACAGGCGCAGCAGAGAGCAGGT^lATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCC 1140 

I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I I 

Db 1081 GACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCC 1140 

Qy 1141 CTGTTTCTAGAAAT^AGTGCGTGACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGGAT 1200 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 1141 CTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGGAT 1200 

Qy 1201 CTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCCACTAGACACCAACTGCCTCCCG 12 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1201 CTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCCACTAGACACCAACTGCCTCCCG 12 60 

Qy 1261 AGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATT 1320 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 1261 AGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACGACGCTGATCGGTCGTCAGATT 1320 

Qy 1321 TCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATG 1380 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 1321 TCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATG 1380 

Qy 1381 TCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGAT 1440 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 

Db 1381 TCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGCATGCAGCTCTCCTTCATGGAT 1440 

Qy 1441 ACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATGTC 1500 

I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 

Db 1441 ACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATGTC 1500 

Qy 1501 ATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTAC 1560 

I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1501 ATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTAC 1560 

Qy 1561 ACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTAC 1620 

I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I M I I 
Db 1561 ACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTAC 1620 

Qy 1621 ATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCC 1680 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1621 ATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCC 1680 

Qy 1681 TTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTG 1740 

I I I I I I I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1681 TTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTG 1740 

Qy 1741 GCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTAC 18 00 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1741 GCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTAC 1800 

Qy 1801 AACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCC 18 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I 
Db 1801 T^ACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCC 1860 



Qy 1861 GCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTG7\AGGGCTGATG7\AGATTCAG 1920 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 1861 GCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAG 1920 

Qy 1921 TTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGAT 198 0 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I LI M I I I I 
Db 1921 TTCAGCAGAAGAACTTATAAAATGCCTCTCGGGTiLACCTCACCATCGCGGTCTCAGGAGAT 19 8 0 

Qy 1981 AAAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTC 204 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1981 7\7\AATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTC 2040 

Qy 2041 ATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAG 2100 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2041 ATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAG 2100 

Qy 2101 A7UVCCAAGTCAAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGC 2160 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2101 AAACCAAGTCAAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGC 2160 

Qy 2161 AGACCCTTCAACTGCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGA 222 0 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I 
Db 2161 AGACCCTTCAACTGCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGA 2220 

Qy 2221 CCCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAG 22 8 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
Db 2221 CCCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAG 22 8 0 

Qy 2281 GATGGCAGTAGAATAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGAT 2340 

I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2281 GATGGCAGTAGAATAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGAT 2340 

Qy 2341 GACTGGGAGAAAACCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGA 24 00 

M I I I I I M I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I I I 
Db 2341 GACTGGGAGAAAACCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGA 2400 

Qy 2401 TATGCATTTATATAGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAG 24 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
Db 2401 TATGCATTTATATAGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAG 2460 

Qy 2461 CTAGACTGTGCAGGAATTGTTGGAACCTGGAGGGTiACAATAACAGTAGCTAGCAGATTTG 252 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2461 CTAGACTGTGCAGGAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTG 252 0 

Qy 2521 GCTTCATCTTCCAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACC 258 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2521 GCTTCATCTTCCAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACC 2580 

Qy 2581 TAAGATGTACCAGCAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGC 264 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2581 TAAGATGTACCAGCAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGC 264 0 

Qy 2641 CAACGTGAACAATTAAAAATGTATTGAGC 2669 

I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
Db 2 641 CAACGTGAACAATTAAAAATGTATTGAGC 2 669 



RESULT 2 
ABK83218 

ID ABK83218 standard; cDNA; 3239 BP. 
XX 

AC ABK83218; 
XX 

DT 27-AUG-2002 (first entry) 
XX 

DE Human transporter and ion channel, TRICH9, Incyte ID 6585710CB1, cDNA. 
XX 

KW Human; ss; gene; transporter and ion channel; TRICH; transport disorder; 

KW neurological disorder; muscle disorder; immunological disorder; cancer; 

KW scleroderma; systemic lupus erythematosus; allergy; leukaemia; 

KW cell proliferative disorder; cervical cancer; breast cancer; 

KW neurodegenerative disorder; Parkinson's disease; Alzheimer's disease; 

KW myotonic dystrophy; catatonia; endocrine disorder; diabetes; 

KW Grave's disease; gastrointestinal disorder; Crohn's disease; 

KW renal disorder; Good pasture's syndrome; viral infection; cirrhosis; 

KW bacterial infection; fungal infection; parasitic infection; 

KW protozoal infection; helminthic infection; cardiovascular disorder; 

KW atherosclerosis; hepatic disease. 

XX 

OS Homo sapiens. 
XX 

PN WO200240541-A2 ■ 
XX 

PD 23-MAY-2002. 
XX 

PF 25-OCT-2001; 2001WO-US04 6055 . 
XX 

PR 27-OCT-2000; 2000US-0243989P . 

PR 03-NOV-2000; 2000US-0245904P. 

PR 09-NOV-2000; 2 OOOUS-0247 673P . 

PR 17-NOV-2000; 2000US-0249661P . 

PR 20-NOV-2000; 2000US-0252232P . 

PR Ol-DEC-2000; 2000US-0250790P . 
XX 

PA (INCY-) INCYTE GENOMICS INC. 
XX 

PI Tang YT, Yue H, Nguyen DB, Hafalia AJA, Elliott VS, Lu Y; 

PI Walia NK, Yao MG, Baughn MR, Gandhi AR, Ding L, Sanjanwala M; 

PI Ramkumar J, Arvizu C, Gietzen KJ, Lai PG, Azimzai Y, Khan FA; 

PI Thangavelu K, Thornton M, Lu DAM, Tribouley CM, Warren BA, Ison CH; 

PI Das D, Raumann BE, Policky JL, Kearney L; 

XX 

DR WPI; 2002-463570/49. 

DR P-PSDB; ABG61539. 
XX 

PT New transporters and ion channels (TRICH) polypeptides, useful for 

PT diagnosing, preventing, and treating disorders associated with an 

PT abnormal expression or activity of TRICH, e.g. immunological, muscular o 

PT renal disorders. 

XX 

PS Claim 5; Page 167-168; 178pp; English. 
XX 

CC The invention relates to human transporters and ion channels (TRICH) 



CC polypeptides, a naturally occurring amino acid sequence 90 % identical to 

CC TRICH, a biologically active fragment of TRICH or an iminunogenic fragment 

CC of TRICH. Also included are an isolated polynucleotide encoding TRICH, a 

CC recombinant polynucleotide comprising a promoter sequence operably linked 

CC to the TRICH polynucleotide, a cell transformed with the recombinant 

CC polynucleotide, a transgenic organism comprising the recombinant 

CC polynucleotide, an isolated antibody that binds specifically to TRICH, 

CC and screening for compounds which bind to TRICH, modulate TRICH, modulate 

CC TRICH expression or are ant/agonists of TRICH. The polypeptides are 

CC useful for diagnosing, treating, and preventing transport, neurological, 

CC muscle, immunological disorders (e.g. scleroderma, systemic lupus 

CC erythematosus, allergies), cell proliferative disorders such as cancers 

CC (e.g. leukaemia, cervical or breast cancers), neurodegenerative disorders 

CC (e.g. Parkinson's disease, Alzheimer's disease), muscular disorders (e.g. 

CC myotonic dystrophy, catatonia), endocrine disorders (e.g. diabetes, 

CC Grave's disease), gastrointestinal disorders (e.g. Crohn's disease), 

CC renal disorders (e.g. Good pasture's syndrome), viral, bacterial, fungal, 

CC parasitic, protozoal and helminthic infections, cardiovascular disorders 

CC (e.g. atherosclerosis), or hepatic diseases (e.g. cirrhosis) and many 

CC other diseases and disorders detailed in the specification. They can also 

CC be used in assessing the effects of exogenous compounds on the expression 

CC of nucleic acid and amino acid sequences of transporters and ion 

CC channels. TRICH or its fragments may also be used in screening for 

CC compounds that specifically bind to and modulate the activity of TRICH. 

CC The polynucleotides can be used to create knock-in humanised animals or 

CC transgenic animals to model human disease. The present sequence encodes a 

CC TRICH protein 

XX 

SQ Sequence 3239 BP; 784 A; 822 C; 796 G; 837 T; 0 U; 0 Other; 

Query Match 63.0%; Score 1680.6; DB 6; Length 3239; 

Best Local Similarity 99.8%; Pred. No. 0; 

Matches 1683; Conservative 0; Mismatches 4; Indels 0; Gaps 0; 

Qy 983 GGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACA 1042 

III I II I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I 

Db 12 GGGGCGGCCAGCACATGGTCCATTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACA 71 

Qy 1043 GCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGG 1102 

I I I I II I I I II I M I I II I I II II I I I I I II I I I II I II I I I I I I I I I I I I I I I I I I II I 

Db 72 GCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGG 131 

Qy 1103 AATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTG 1162 

I M I I I I I I I I M M I I I I I I M II I M M I M I II I I II I II I I I I I I M II M I M I I 

Db 132 AATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTG 191 

Qy 1163 ACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGG 1222 

I I I I II I I I II II I M I I I II M M I I II I I II M I I I I II I I II II I M II I M II I II 

Db 192 ACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGG 251 

Qy 1223 AAAGCAGCGTGACCCCACTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGG 1282 

II I I I I I I I I II I I I I I II I I I I I I I I I I I II I I I II II I II I I II I I I I I I II I I I I I I 

Db 252 AAAGCAGCGTGACCCCACTAGACACC7VACTGCCTCCCGAGTCCTACGAAGATGCCTGGGG 311 

Qy 12 83 CGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGC 1342 

I I II I 1 I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I II II I I II I I 
Db 312 CGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGC 371 



Qy 

Db 



1343 CCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCT 14 02 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I M I I 
372 CCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCT 431 



Qy 1403 ATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGA 14 62 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 
Db 432 ATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGA 491 

Qy 1463 TCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGA 1522 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
Db 492 TCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGA 551 

Qy 1523 GGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTG 1582 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 552 GGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTG 611 

Qy 1583 CCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCA 1642 

I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 612 CCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCA 671 

Qy 1643 CCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGG 1702 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 672 CCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGG 731 

Qy 1703 TGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCA 1762 

I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 732 TGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCA 791 

Qy 1763 CCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGG 1822 

I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 792 CCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGG 851 

Qy 1823 GCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCT 1882 

I M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 852 GCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCT 911 

Qy 1883 TCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATA7\AA 1942 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 912 TCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAGTTCAGCAGAAG7!^CTTATAAAA 971 

Qy 1943 TGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGATAA7VATCCTCAGTGCCATGGAGC 2002 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 972 TGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGC 1031 

Qy 2003 TGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCA 2062 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 1032 TGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCA 1091 

Qy 2063 TGGTCCTGTACTACGTGTCCTTAAGGTTCATC7WVCAGAAACCAAGTCAAGACTGGTGAT 2122 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 
Db 1092 TGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGAAACCAAGTCAAGACTGGTGAT 1151 

Qy 2123 TCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACTGCACTCCCT 2182 

I M I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 
Db 1152 TCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACTGCACTCCCT 1211 



Qy 2183 CCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCTCAGCTACAT 2242 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1212 CCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCTCAGCTACAT 1271 

Qy 2243 CCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAG7\ATAAAGACAG 2302 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1272 CCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAATAAAGACAG 1331 

Qy 2303 TCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAGAAAACCTGCACTC 2362 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1332 TCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAGATWVCCTGCACTC 1391 

Qy 2363 GGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTATATAGGCAACTC 2422 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1392 GGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTATATAGGCAACTC 1451 

Qy 2423 GATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGCAGGAATTGTTG 2482 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1452 GATATAGGATGGGAGCAAACTAGGTyVTGAATTGGGTAGCTAGACTGTGCAGGAATTGTTG 1511 

Qy 24 83 GAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCAGGGGCCCCA 2542 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1512 GAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCAGGGGCCCCA 1571 

Qy 2543 CACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTACCAGCAAGATGCC 2602 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1572 CACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTACCAGCAAGATGCC 1631 

Qy 2603 ATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCATW^GCCAACGTGAACAATTTWWVTGT 2662 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1632 ATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGCCAACGTGAACAATTAAAi\ATGT 1691 

Qy 2663 ATTGAGC 2669 

I I I I I I I 

Db 1692 ATTGAGC 1698 



RESULT 3 
ABN90022 

ID ABN90022 standard; cDNA; 2564 BP. 
XX 

AC ABN90022; 
XX 

DT 16-AUG-2002 (first entry) 
XX 

DE Mouse clone IMX3_67 extended sequence. 
XX 

KW Mouse; antiinflammatory; gene therapy; ileitis; DST; ss; TOGA; 

KW digital sequence tag; total gene expression analysis. 

XX 

OS Mus musculus. 
XX 

PN WO200231114-A2. 
XX 

PD 18-APR-2002. 
XX 

PF ll-OCT-2001; 2001WO-US032091 . 



XX 

PR ll-OCT-2000; 2000US-0239483P . 
XX 

PA (DIGI-) DIGITAL GENE TECHNOLOGIES INC. 
XX 

PI Viney JL, Sims JE, Dubose RF, Baum PR^ Hasel KW, Hilbush BS; 
XX 

DR WPI; 2002-426279/45. 
XX 

PT New isolated nucleic acid molecules that are associated with ileitis, for 

PT preventing, treating, modulating and diagnosing ileitis in a mammalian 

PT subject. 
XX 

PS Claim 1; Page 266-268; 273pp; English. 
XX 

CC The invention relates to a novel isolated nucleic acid molecule 

CC comprising a polynucleotide having one of 90 polynucleotide sequences, 

CC given in the specification. The polynucleotides of the invention have 

CC antiinflammatory activity, and may have a use in gene therapy. The 

CC polynucleotide or a polypeptide encoded by it is used for preventing, 

CC treating, modulating or ameliorating a medical condition such as ileitis . 

CC The polypeptide or polynucleotide is also useful for manufacturing a 

CC medicament for treating ileitis. The sequence represents a an extended 

CC cDNA digital sequence tag obtained from a mouse clone by the TOGA (total 

CC gene expression analysis) method 

XX 

SQ Sequence 2564 BP; 623 A; 722 C; 638 G; 581 T; 0 U; 0 Other; 

Query Match 56.2%; Score 1499; DB 6; Length 2564; 

Best Local Similarity 77.2%; Pred. No. 0; 

Matches 1933; Conservative 0; Mismatches 520; Indels 52; Gaps 7; 

Qy 99 CATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATAC 158 

I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I I I I I I 

Db 34 CATGGCTGAGA7WVCCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTTCAGGATGC 93 

Qy 159 CTC GGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCAC 215 

II I I I I I I I I I I I I II I I I I I II I I II I II II I I I I I I M I I I I I I I I I I I I 
Db 94 TTCGCAGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGAC7\ACAGTCTGTACTTCAC 153 

Qy 216 CTACAGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTC7\ACTACCAGGTGGACCTGGC 275 

I I I I I I I I I III I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I II 

Db 154 CTACAGTGGTCAGTCCAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACATCGC 213 

Qy 27 6 CTCTCAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAG 335 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II 

Db 214 CTCTCAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAG 273 

Qy 336 CTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCA 395 

I I I II I II I I I II I I I I I I II I II M I II I I I I I I I I I I I I I I Mill II 
Db 274 CAGCCAAGACTCCTGTGAGCTGGGCATCCGAAATCTAAGCTTCAAAGTGAGGAGTGGACA 333 

Qy 396 GATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCAC 455 

M I I II I I I I I I M I I I I I I I I I I I I I II I I I I I I I I I I I I II II I I I II I I I 
Db 334 GATGCTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCAC 393 



Qy 



456 TGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAG 515 



Db 394 AGGCAGAGGCCACGGTGGCAAGATGAAATCAGGACTU^TTTGGATAAATGGGCAACCCAG 453 

Qy 516 CTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCC 575 

I I I I I I I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I M I II 

Db 454 TACGCCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCC 513 

Qy 576 CAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTT 635 

MM MM M I I II I M I I I I I II I I I I II I I II I I II I M I I I I II Mill 

Db 514 CAACCTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTT 573 

Qy 636 CTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCA 695 

I I I I I I I M I I I II I I M I I I I M I II I I I I I I I I I I I I I I I II I I II II I I I I 

Db 574 CTCCCAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCA 633 

Qy 696 GTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAG 755 

II II I I II I I I I II II I I I I II III M II III I I II I II II II I M I I I 

Db 634 GTGCGCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCG 693 

Qy 756 GAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACC 815 

II II I I I II II I II I I I I II I I I II I M I I I II I I I II M M II I I I II II II I 

Db 694 ACGAGTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACC 753 

Qy 816 CACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGC 87 5 

III II I I I I I II I I I I I I I I I I I I I I I I I I I M I I I II II I II I II I I II I I I 

Db 754 CACTTCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGC 813 

Qy 87 6 CAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCT 935 

III II I II I II M I I II I I I I I I II I I I I II I I I I I I II I I II I I II II II I I I I I II 
Db 814 CAAGGGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCT 87 3 

Qy 936 GTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCA 995 

I I I II II I I I I II I I II I I I I I I II I I II I M I II I II I II I I I I I I Mill 

Db 874 ATTTGACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCA 933 

Qy 996 CATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGA 1055 

II II I I II II II I I II II I I III I II I II I I I II II II II I II II II I II 

Db 934 AATGGTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGA 993 

Qy 1056 CTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAG 1115 

Mini I II I II I I I I I I II I I II II I II I II I I III I III I I II II I I 
Db 994 CTTCTACGTGGACTTGACCAGCATCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGT 1053 

Qy 1116 GGAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTT 1175 

II I I I II I I I I I I I II I I I I II II I I I M I II I I I I I I I I III M II I I I I 
Db 1054 GGAGAAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTAC7\AGGCTTTGATGACTT 1113 

Qy* 1176 TCTATGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGGAAAGCAGCGTGAC 1235 

II I I II M II I II I I I II I I I I II II II I III 

Db 1114 TCTGTGGAAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAGTCAGCCTGACCCT 1173 

Qy 1236 CCCACTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTT 1295 

I II I Mill MM I I I I I M I I II I I I II I I I I I I I 

Db 1174 CACACAGGACACTGACTG TGGGACTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTT 1230 

Qy 12 96 TACGACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCAT 1355 

I I II I II I II II II II I I I I II M II I II I II I I I I II II II II I I II Mill 



Db 1231 TTCCACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCAT 12 90 

Qy 1356 CCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGG 1415 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1291 TCATGGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGG 1350 

Qy 1416 GAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCAT 1475 

I II II I .1 I I I I II M I I I II I I II I I I I I II I I II I I I I I II II I I I I I 
Db 1351 GGCCAAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCAT 1410 

Qy 147 6 CCCTTTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTA 1535 

I I I I I I I I I I I I I I I I M II I I I II II I I I II I I I I I MINI I I I I I I I M 

Db 1411 TCCTTTCAATGTCATCCTGGATGTCGTCTCCAAATGTCACTCGGAGAGGTCAATGCTGTA 147 0 

Qy 1536 CTATGAACTGG7VAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATCCTCGG 1595 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I M I I I II I I I I II 
Db 1471 CTATGAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGG 1530 

Qy 1596 GGAGCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGC 1655 

II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 

Db 1531 AGTUVTTGCCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGAC 1590 

Qy 1656 CAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGT 1715 

MINI I I I I I I III I I I II I II I I II I I I I I II I I I I I I I I I I I I I 
Db 1591 AAACCTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGT 1650 

Qy 1716 CTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGC 1775 

I I I I I I I II I I I I I I I I I I I M I I III I III I I I I I I I I I I II I I I M I I I 

Db 1651 CTTCTGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTC 1710 

Qy 177 6 CTCCTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAA 1835 

II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I 

Db 1711 CTCCTTCTTCTGCAATGCCCTCTAC7UVCTCCTTCTACCTTACTGCCGGCTTCATGATAAA 1770 

Qy 1836 CTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTG 1895 

I I I I II I I I I I I I I MINI II Mill Mill MM I II II I I II I M I 

Db 1771 CTTGGACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTG 1830 

Qy 18 96 TTTTGAAGGGCTGATGT^AGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAA 1955 

II I I M I II I II II I II II I II I I I I I I I II I II 

Db 1831 CTTCTCGGTGCTGATGCAGATTCAATTTAATGGACACCTTTACACCACACAAATCGGCAA 1890 

Qy 1956 CCTCACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGCTGGACTCGTACCC 2015 

I I I II I I I I II I II II I II I I I I II II II II I I I I II I II I I I 

Db 1891 CTTCACCTTCTCCATCCTCGGAGACACGATGATCAGTGCCATGGACCTGAACTCGCATCC 1950 

Qy 2 016 TCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTA 2075 

Mill II II II I II I I M I I I I I III Mill I II I II II I I I I II M I I 
Db 1951 ACTCTATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTA 2010 

Qy 2076 CGTGTCCTTAAGGTTCATCAAACAGAAACCAAGTCAAGACTGGTGATTCACGCCAGACGT 2135 

I II I I I I I I II I I I M I I I I I II I I I I M I II I I II I I I I I I 
Db 2 011 TCTATCCTTGAAGCTCATCAAACAGAAGTCAATTCAAGACTGGTGATACTCAGCCTTGCT 207 0 

Qy 2136 CTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACTGCACTCCCTCCTCAGGAGCCCC 2195 

II II II I II M I I II I I I I 

Db 2071 CTCACTGGCGG GACCCTTTTCCCGGGGCTGGCCACCCCAGGAGGAGCC 2118 



Qy 2196 TTCCTGGGGACAGTGAGGACT^TGACCCTACAGATGCTCAGCTACATCCGGCCCAGGGTG 2255 

I I I I I I I I I I III III III II II I I I I III 

Db 2119 GGACTGGGGACAAGGCTCACACAGATCTCTCAG GCAGCAGCCACCTCTTAGTG 2171 

Qy 2256 CTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAATAAAGACAGTCGAAAGGGATTT 2315 

I I I I I I I I II I II I I I I II I I I I II I I I I I I I I I I II I I I I I I I I II II III III 
Db 2172 CTGCAGTGGCACAGATCAGCCACAGGATGGCAGTAGAATA7VAGACAGTTGAGAGGTGTTT 2231 

Qy 2316 CTGCTCACTGGCAGGAGACTGCGATGACTGGGAGAAAACCTGCACTCGGTGGCACCTACA 2375 

I I I I I I I I I I I II III I I I I I I I III 
Db 2232 CTGCTCCCAGGCCCAGGTTTGTAATGGGAGAGAGAGAA ACCAGGT 2276 

Qy 2376 ACGTTGCTAATTTATTTCCTTTTGATATGCATTTATATAGGCAACTCGATATAGGATGGG 2435 

I I I I I I I I I I I I II III I I I I I I I I I I I I I I I I II I I I 

Db 2277 ACGTTGCTCATGCATTT TATATCTTTAAAT7\AACAACCCAGTATGGAATGGG 2328 

Qy 2436 AGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGCAGGAATTGTTGGAACCTGGAGGGA 2495 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2329 AACCAATTATATATGAATTGAGTAGCTAGGCTATGCAGAAATTTCTGG7\ATCCTGAGAGG 2388 

Qy 24 96 ACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCAGGGGCCCCACACTCCGTGGTGA 2555 

III I I I I I I I I I I I I I I I I I I I I I I I 

Db 2389 ATAGTGGTTTATAGCAAAGTGTTTAACTTTCTCTTCTACCATTCTCACAC TGTTAA 2444 

Qy 2556 GCCACCATCAATACAGTW^GTGACCTAAGATGTACCAGCAAGATG 2600 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 2445 GCCACTCCCAATACAAAGGGCGACCTAAAACAAACTAGCA7\AATG 2489 



RESULT 4 
7WD48881 

ID AAD48881 standard; DNA; 2019 BP. 
XX 

AC AAD48881; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Mouse A£CG8 DNA. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolaemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW mouse; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5; gene; ds . 
XX 

OS Mus sp . 
XX 

FH Key Location/Qualifiers 

FT CDS 1. .2019 

FT /*tag= a 

FT /product= "mABCGS protein" 

FT /transl_except= (pos:1318. .1320, aa:Leu) 

XX 

FN WO200281691-A2. 
XX 

PD 17-OCT-2002. 



PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2 O0OUS-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R^ Tian H; 
XX 

DR WPI; 2003-058548/05, 

DR P-PSDB; AAE31703. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitos terolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 13; Page 75; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is mouse ABCG8 DNA 
XX 

SQ Sequence 2019 BP; 444 A; 598 C; 510 G; 467 T; 0 U; 0 Other; 

Query Match 53.6%; Score 1430; DB 7; Length 2019; 
Best Local Similarity 82.0%; Pred. No. 0; 

Matches 1659; Conservative 0; Mismatches 360; Indels 3; Gaps 1 

Qy 100 ATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATACC 159 



Db 



1 ATGGCTGAGAAAACCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTTCAGGATGCT 60 



Qy 



160 TCGGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTAC 219 
I I I I I I I I I I I I I I II I II I I I I I I II I II I I I II II I I I I I II II II II II II II 
61 TCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACCTAC 120 



Db 



220 AGTGGCCAGCCC7\ACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTCT 279 





Db 



121 AGTGGTCAGTCCAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACATCGCCTCT 180 



Qy 



280 CAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCT^GATGCCCTGGACATCTCCCAGCTGC 339 




Db 



181 CAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAGC 240 



Qy 



340 CAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATG 399 





Db 



241 CAAGACTCCTGTGAGCTGGGCATCCGAAATCTAAGCTTCAAAGTGAGGAGTGGACAGATG 300 



Qy 



400 CTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGC 459 



Db 301 CTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACAGGC 360 

Qy 460 CGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCG 519 

I I I I I I I I I I I M I I I I II I II I I II II I II I I II I I I M I II I I I II 
Db 361 AGAGGCCACGGTGGCAAGATGAT^TCAGGACAAATTTGGATAAATGGGCAACCCAGTACG 420 

Qy 52 0 CCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAAC 579 

II II I I I I II I I I II I II I I II II II I I I I I I I I I I I II I I I I II I I I I II 
Db 421 CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 480 

Qy 580 TTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCC 639 

MM M II M II II MM II II II II II M II II I I M I M M II II II II I 

Db 481 CTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCC 540 

Qy 640 CAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGC 699 

II II I II II I II II II II II II I II Mill I II I I I II I I M I M I II II II I I 
Db 541 CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 600 

Qy 700 GCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGA 759 

II II II I I M II II I M I III II II Ml I II I I II II II II II I I II 
Db 601 GCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGA 660 

Qy 7 60 GTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGTVACCCACC 819 

II II M M II II I II I II I I II I II II II I II II M II II I I I II II II II I I II 

Db 661 GTGAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTCATTCTGGATGAACCCACT 720 

Qy 820 TCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAA 879 

II II I II II II II M I II II II I II M I I II I II I M M I M II I I II II II I 
Db 721 TCTGGCCTCGACAGCTTCACAGCCCACAATCTGGTGACAACCTTGTCCCGCCTGGCCAAG 78 0 

Qy 8 80 GGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTT 939 

MUM II I M II II II I II II I M II II II II II II II M I II II II I M II II III 

Db 7 81 GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 84 0 

Qy 940 GATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATG 999 

II II II II I I II II I II I M M I II II II II II I I I II II II II II II I III 
Db 841 GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 900 

Qy 1000 GTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTC 1059 

II II I II I II II I I II I III II II II II I M II II I I I I I II II I II I II I 
Db 901 GTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTC 960 

Qy 1060 TATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAG 1119 

II II I II I II II II I II I II II I II II II I Ml I III II II II II II II 
Db 961 TACGTGGACTTGACCAGCATCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGTGGAG 1020 

Qy 1120 AAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTTTCTA 1179 

II II I II II I II I II II II I II I II II II II II I I I Ml M II M II II I 
Db 1021 7\AGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTTCTG 1080 

Qy 1180 TGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCCA 1239 

M II II II I II I I II II II II I II I I III III 

Db 1081 TGGAAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAGTCAGCCTGACCCTCACA 114 0 

Qy 1240 CTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACG 1299 

I II II I II II II I II II II II I II II II II I II I I , 



Db 1141 CAGGACACTGACTG TGGGACTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTTTTCC 1197 

Qy 1300 ACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCAT 1359 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I II I II II III 

Db 1198 ACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCAT 1257 

Qy 1360 GGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGC 1419 

I I I I I II I I I I I I I I I I M I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 1258 GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 1317 

Qy 1420 ATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCT 1479 

I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II II I I I I I III 
Db 1318 AAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 1377 

Qy 1480 TTCAACGTCATTCTGGATGTCATCTCCj^AATGTTACTCAGAGAGGGCAATGCTTTACTAT 1539 

I I I I I II I I I I I I I II I I I I I I I II I I I II I I I I I I I I I I I II I I I I I I I I I I 

Db 1378 TTCAATGTCATCCTGGATGTCGTCTCCAAATGTCACTCGGAGAGGTCT^TGCTGTACTAT 1437 

Qy 1540 GAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAG 1599 

II I I I I I I I I I I I I M II I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I II II 

Db 1438 GAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAA 1497 

Qy 1600 CTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAAC 1659 

I I I I I I I I I II I I I I I M I I I I I I I I I I I I I I I I I I I I I I M M I M I I I III 

Db 14 98 TTGCCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGACAAAC 1557 

Qy 1660 CTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTC 1719 

III 1 1 1 1 I I III I II 1 1 1 II 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 1558 CTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTC 1617 

Qy 1720 TGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCC 1779 

II I I I I I I I I I I I I I I I I I I III I III I I I I I I I I I I I I I I I I I I I Mill 

Db 1618 TGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCC 1677 

Qy 17 8 0 TTCTTCAGCAATGCCCTCTACTUVCTCCTTCTACCTCGCCGGGGGCTTCATGATTWVCTTG 1839 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1678 TTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTG 1737 

Qy 184 0 AGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTT 1899 

II M I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 1738 GACAACCTGTGGATAGTGCCTGCATGGATCTCCAAGCTGTCGTTCCTCCGGTGGTGCTTC 1797 

Qy 1900 GAAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTC 1959 

I I I I I II I I I I I I I I I M I II I I I I I I I I I I I I I M 

Db 1798 TCGGGGCTGATGCAGATTCAATTTAATGGACACCTTTACACCACACAAATCGGCAACTTC 1857 

Qy 1960 ACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTC 2019 

I I I I I I II MINI II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1858 ACCTTCTCCATCCTCGGAGACACGATGATCAGTGCCATGGACCTGAACTCGCATCCACTC 1917 

Qy 2020 TACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTG 2079 

II II I I I I I I I M I I I I I I I III I I I I I I I I I I I II I I I I I I I I I I I 

Db 1918 TATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTA 1977 

Qy 208 0 TCCTTAAGGTTCATCTWVCAGAAACCAAGTCT^GACTGGTGA 2121 

I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I M I I 

Db 1978 TCCTTGAAGCTCATCAAACAGAAGTCAATTCAAGACTGGTGA 2019 



RESULT 5 
AAH98911 

ID 7\AH98911 standard; cDNA; 580 BP. 
XX 

AC AAH98911; 
XX 

DT 12-OCT-2001 (first entry) 
XX 

DE Arabidopsis EST-derived coding sequence SEQ ID NO: 768. 
XX 

KW Human; sheep; pig; cow; fruit fly; yeast; hamster; macaque; horse; 

KW tomato; monkey; dog; sea urchin; expressed sequence tag; EST; 

KW diagnostics; forensic test; gene mapping; genetic disorder; biodiversity; 

KW gene therapy; nutrition; ss. 

XX 

OS Arabidopsis thaliana. 
XX 

PN . WO200154477-A2. 
XX 

PD 02-AUG-2001. 
XX 

PF 25-JAN-2001; 2001WO-US002 687 . 
XX 

PR 25-JAN-2000; 2000US-004 91404 . 

PR 17-JUL-2000; 2000US-00617746 . 

PR 03-AUG-2000; 2000US-00631451 . 

PR 15-SEP-2000; 2000US-00663870 . 
XX 

PA (HYSE-) HYSEQ INC. 
XX 

PI Tang YT, Liu C, Zhou P, Qian XB, Wang Z, Chen R, Asundi V; 

PI Cao Y, Drmanac RA, Zhang J, Werhman T; 

XX 

DR WPI; 2001-476164/51. 

DR P-PSDB; AAM24252. 
XX 

PT Isolated polypeptide for treatment of diseases, diagnostics, raising 

PT antibodies and research use. 

XX 

PS Claim 1; Page 664; 1275pp; English. 
XX 

CC The present invention provides the protein and coding sequences of novel 

CC proteins from a variety of organisms, including human, dog, cat, horse, 

CC cow, pig, hamster, monkey, macaque, yeast, bacteria, fruit fly, sea 

CC urchin and tomato. These were derived from expressed sequence tags (ESTs) 

CC from the organism of interest. They can be used in diagnostics, 

CC forensics, gene mapping, identification of mutations, to assess 

CC biodiversity and for nutritional purposes. The present sequence is a cDNA 

CC of the invention 

XX 

SQ Sequence 580 BP; 146 A; 154 C; 116 G; 164 T; 0 U; 0 Other; 



Query Match 10.9%; Score 291.6; DB 4; 

Best Local Similarity 97.1%; Pred. No. 9.9e-68; 
Matches 2 97; Conservative 0; Mismatches 9; 



Length 58 0; 
Indels 0; Gaps 



0; 



Qy 1509 ATGTTACTCAGAGAGGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTACACCACTGG 1568 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 275 AGGTTACTCAGAGAGGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTACACCACTGG 334 

Qy 1569 TCCATATTTCTTTGCCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTACATCATCAT 1628 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I 
Db 335 TCCATATTTCTTTGCCAAGATCCTCGGCGAGCTTCCGGAGCACTGTGCCTACATCATCAT 394 

Qy 162 9 CTACGGGATGCCCACCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCT 1688 

I I M I I I I I I M I I M I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 395 CTACGGGATGCCCACCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCT 454 

Qy 168 9 GCACTTCCTGCTGGTGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGC 1748 

I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 455 GCACTTCCTGCTGGAGTGGCTGGCGGTCTTCTGTTGCAAGATTATGGTCCTGGCCGCCGC 514 

Qy 1749 GGCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTCCTT 1808 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 515 GGGCCTGCTCCCCACCTTACACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTGCTT 574 

Qy 1809 CTACCT 1814 

I I I I I I 

Db 575 CTACCT 580 



RESULT 6 
ABK51681 

ID ABK51681 standard; DNA; 1920 BP. 
XX 

AC J\BK51681; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE DNA encoding human ABCG5 protein. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sites terolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia; Alzheimer's disease; 

KW chromosome 2p21; ds . 
XX 

OS Homo sapiens. 



XX 

FH Key Location/Qualifiers 

FT CDS 1. .1920 

FT /*tag= a 

FT /product= "Human ABCG5 protein" 

FT /transl_except= (pos: 4. .9, aa: GDLSSLTPGGSMGL) 

FT /note= "This sequence contains 13 exons" 

XX 



PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 



PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 



DR P-PSDB; AAU98984. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 38; Page 36-37; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of 7VBCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence represents the human ABCG5 gene located on chromosome 2p21. 

CC This sequence encodes the human ABCG5 protein of the invention 

XX 

SQ Sequence 1920 BP; 440 A; 503 C; 486 G; 491 T; 0 U; 0 Other; 



Query Match 7.6%; Score 203.6; DB 6; Length 1920; 

Best Local Similarity 54.4%; Pred- No. 8.2e-44; 

Matches 432; Conservative 0; Mismatches 359; Indels 3; Gaps 1 

Qy 335 GCTGCCAGAATTCTTGTGAGCTGGGCATCCAG7\ACCTAAGCTTCAAAGTGAG7yVGTGGGC 394 

I I I I I I I I III I I I I I I III I I I I I I II I I 

Db 143 GCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGC 202 

Qy 395 AGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCA 454 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 203 AGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGG7VAAACCACGCTGCTGGACGCCATGT 262 

Qy 455 CTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCA 514 

I I I I III III I I I I I I I I I I I I I I 

Db 263 CCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGC 322 

Qy 515 GCTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCC 574 

I I I I I I I I I I I I I I M I I III III I I I II 



Db 



323 TGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGA 382 



Qy 575 CCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCT 634 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 

Db 383 GCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCA 442 

Qy 635 TCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGC 694 

II M I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I 

Db 44 3 ATCCCGGCTCCTTCC AGAAGT^AGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCC 4 99 

Qy 695 AGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCA 754 

I Mill I I I I I I I I I I I I I I I III I I I I I I I I I 

Db 500 ATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGC 559 

Qy 755 GGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAAC 814 

I MM II I I I II I II II I I II III I I I I II I 

Db 560 GCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGC 619 

Qy 815 CCACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGG 874 

I M I I I I II II I M I I I II I I III I II II II 

Db 62 0 CAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGG 679 

Qy 875 CCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGC 934 

I I II II I I I II I I I I I I I I I II I I I I I I I I M III M 

Db 680 CTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGC 739 

Qy 935 TGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGC 994 

I I I I I I I II III I II I II I I I M III I 

Db 740 TCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGG 7 99 

Qy 995 ACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTG 1054 

Mill II I I I I II I II II I II II I Ml I I II I I M 

Db 800 AAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTG 859 

Qy 1055 ACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCA 1114 

I I I II M II II I I I II II I II I I I I I I I I I II I I M III 

Db 860 ACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGTUVATAGAAACCT 919 

Qy 1115 GGGAGAAGGCTCAG 1128 

III I III 

Db 920 CCAAGAGAGTCCAG 933 



RESULT 7 
AAD22009 

ID AAD22009 standard; DNA; 2340 BP. 
XX 

AC AAD22009; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Human sitosterolaemia susceptibility gene (SSG) . 
XX 

KW Human; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia ; hypercholesterolaemia; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 2p21; ds . 



XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 

FT CDS 107. .2062 

FT /*tag= a 

FT /product= "Human SSG protein" 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 

PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TUIiARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 

DR P-PSDB; AAE13290. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder, 
XX 

PS Claim 8; Fig 8; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaertiia^ 

CC hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is human SSG DNA, Human SSG is located on chromosome 

CC 2p21 

XX 

SQ Sequence 2340 BP; 541 A; 601 C; 598 G; 600 T; 0 U; 0 Other; 

Query Match 7.6%; Score 203.6; DB 6; Length 2340; 
Best Local Similarity 54.4%; Pred. No. 9.1e-44; 

Matches 432; Conservative 0; Mismatches 359; Indels 3; Gaps 1 

Qy 335 GCTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGC 394 

I I I I I I I I III I II I I I III I I I I I I I I I I 

Db 285 GCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGC 344 



Qy 395 AGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCA 454 

I I I I II I II I I I I I I I I I M II I II I I I II I I I I I I I I II 

Db 345 AGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGT 404 

Qy 455 CTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCA 514 

I I I I III III II I I I I I I I I I I I I 

Db 4 05 CCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGC 4 64 

Qy 515 GCTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCC 574 

I I I I I I I I I I I II I I I I I III III I I II I 

Db 4 65 TGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGA 524 

Qy 575 CCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCT 634 

II I I I I I I I II I I I I I II II I I I I I I I I I I I I I 

Db 525 GCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCA 584 

Qy 635 TCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGC 694 

II II I I I I I I I I M I I I I II I I I I I I I I I I I I I I I I 

Db 585 ATCCCGGCTCCTTCC AGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCC 641 

Qy 695 AGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCA 754 

I I I I I I I I I II I I I I I I I I I I III I I I I I I I I I 

Db 642 ATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGC 701 

Qy 755 GGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGG7\ATCCTTATTCTCGACGAAC 814 

I I I I I I I I I I I I I I I I I MM III I I I I I I I 

Db 702 GCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGC 761 

Qy 815 CCACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGG 874 

I I I I I I I I I I I I I I I I I I I I I III III I I I I 

Db 762 CAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGG 821 

Qy 875 CCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGC 934 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II 

Db 822 CTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGC 881 

Qy 935 TGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGC 994 

I I I I I I I II III I II I I I I I I II III I 

Db 8 82 TCTTTGAC7\AAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGG 941 

Qy 995 ACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTG 1054 

I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I II 

Db 942 7\7\ATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTG 1001 

Qy 1055 ACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCA 1114 

I I I I I I I I I I I I I I I I I I I I I I I INI I I I I I I I I I III 

Db 1002 ACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGA7UVCCT 1061 

Qy 1115 GGGAGAAGGCTCAG 1128 

III I III 

Db 1062 CCAAGAGAGTCCAG 1075 



RESULT 8 
AAD48882 

ID AAD48882 standard; DNA; 2340 BP. 



XX 

AC AAD48882; 
XX 

DT 24-M7VR-2003 (first entry) 
XX 

DE Human ABCG5 DNA. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolaemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW human; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5; gene; ds . 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualif iers 

FT CDS 107. .2062 

FT /*tag= a 

FT /product^ "hABCGS protein" 
XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR P-PSDB; AAE31704. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 11; Page 77; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is human ABCG5 DNA 
XX 

SQ Sequence 2340 BP; 541 A; 601 C; 598 G; 600 T; 0 U; 0 Other; 



Query Match 7.6%; Score 203.6; DB 7; Length 2340; 

Best Local Similarity 54.4%; Pred. No. 9.1e-44; 



Matches 432; Conservative 0; Mismatches 359; Indels 3; Gaps 1; 

Qy 335 GCTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGC 394 

I I I I I I I I III I I I I I I III I I II I I I I I I 

Db 285 GCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGC 344 

Qy 395 AGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCA 454 

I I I I II II II I I I I I II I I I II I I I I I I I I I I I I I I I I II 
Db 345 AGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGT 404 

Qy 455 CTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCA 514 

I I I I IN III II I I I I I I I I I I I I 

Db 405 CCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGC 464 

Qy 515 GCTCGCCTCAGCTGGTGAGG7\AGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCC 574 

I I I I I I I I II I I I I I I I I III III I I I I I 
Db 465 TGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGA 524 

Qy 575 CCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCT 634 

I I I I I I II I I I I I I I I II MM I I I I I I I I I I I 

Db 525 GCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCA 584 

Qy 635 TCTCCCAGGCCCAGCGTGACAT^AAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGC 694 

II II I I I I I I M I II M II I I I I I II I I II I I I I I I 

Db 585 ATCCCGGCTCCTTCC AGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCC 641 

Qy 695 AGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCA 754 

I Mill I I I II M I I I I M I I III II I I II I I I 

Db 642 ATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGC 701 

Qy 755 GGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAAC 814 

I I I I I II I I I I II I I M I I II III I I II I I I 

Db 702 GCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGC 761 

Qy 815 CCACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGG 874 

I I I I I I I M II I I I I II I I I I III III I II I 

Db 762 CT^CCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGG 821 

Qy 875 CCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGC 934 

I I II II I I I I I I II I I I II I I I II I I I I II M Ml II 

Db 822 CTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGC 881 

Qy 935 TGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGC 994 

I I I I II I II III I II I II I II II III I 

Db 882 TCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGG 941 

Qy 995 ACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTG 1054 

I I I I I II I I I I II I II I M II I II I I I I I I I I I II 

Db 942 AAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTG 1001 

Qy 1055 ACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCA 1114 

I I I II II I I I I I I I I I II I I I I I I I I I I I I I I M I I III 

Db 1002 ACTTCTATATGGACCTGACGTCAGTGGATACCC7VAAGCAAGGAACGGGAAATAGAAACCT 1061 

Qy 1115 GGGAGAAGGCTCAG 1128 

III I III 

Db 1062 CCAAGAGAGTCCAG 1075 



RESULT 9 
ABK51682 

ID ABK51682 standard; cDNA; 2516 BP. 
XX 

AC ABK51682; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 cDNA sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease; 

KW chromosome 2p21; ss. 
XX 

OS Homo sapiens. 
XX 

FN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M, 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 37-38; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 



CC disease. The method of the invention is useful for increasing cholesterol 
CC excretion and/or decreasing cholesterol adsorption. The present nucleic 
CC acid sequence represents the cDNA sequence of human ABCG5 gene located on 
CC chromosome 2p21 
XX 

SQ Sequence 2516 BP; 601 A; 631 C; 636 G; 648 T; 0 U; 0 Others- 
Query Match 7.6%; Score 203.6; DB 6; Length 2516; 
Best Local Similarity 54.4%; Pred. No. 9.4e-44; 

Matches 432; Conservative 0; Mismatches 359; Indels 3; Gaps 1; 

Qy 335 GCTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGC 394 

I I I I I I I I III I II I I I III I I II I I II I I 

Db 319 GCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGC 378 

Qy 395 AGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCA 454 

I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II 

Db 379 AGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGT 438 

Qy 455 CTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCA 514 

I I I I III III I I I I I I I I I I I I I I 

Db 439 CCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGC 4 98 

Qy 515 GCTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCAC7\ACCAGCTGCTCC 574 

I I I I I I I I I I I I I I I I I I III III I I I I I 
Db 4 99 TGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGA 558 

Qy 575 CCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCT 634 

I I I I I I I I I I I I I I II II I I I I II I I I I I I I I I 

Db 559 GCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCA 618 

Qy 635 TCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGC 694 

II II I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I 
Db 619 ATCCCGGCTCCTTCC AGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCC 675 

Qy 695 AGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCA 754 

I I I I I I I I I I I I I I I I I I I I I III I I I I I II II 

Db 676 ATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGC 735 

Qy 755 GGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAAC 814 

I I I I I I I I I I I I I I I I I I I I I III I I I I I I I 

Db 736 GCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGC 795 

Qy 815 CCACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGG 874 

I I M I I I I I I I I M I II I I I I III III I I I I 

Db 7 96 CAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGG 855 

Qy 875 CCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGC 934 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II 

Db 856 CTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGC 915 

Qy 935 TGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGC 994 

I I I I I I I II III I II I I I I I I II III I 

Db 916 TCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGG 975 

Qy 995 ACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTG 1054 

I I I I I II I I I I II M I I I I I I I I I I I I I II I I I II 



Db 



976 AAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTG 1035 



Qy 1055 ACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCA 1114 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 1036 ACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCT 1095 

Qy 1115 GGGAGAAGGCTCAG 112 8 

III I III 
Db 1096 CCAAGAGAGTCCAG 1109 



RESULT 10 
7VBK51686 

ID ABK51686 standard; cDNA; 2035 BP. 
XX 

AC ABK51686; 
XX 

DT 07-AUG-2003 (revised) 

DT 30-JUL-2002 (first entry) 

XX 

DE cDNA encoding rat ABCG5 protein. 
XX 

KW Rat; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; ss; 
KW arteriosclerosis; heart disease; hyper sterolemia; Alzheimer's disease. 
XX 

OS Rattus sp. 
XX 

FH Key Location/Qualif iers 

FT CDS 8. .1965 

FT /*tag= a 

FT /product^ "Rat 7^CG5 protein" 

XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 
PA (PATE/) PATEL S B. 
PA (DET^/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
DR P-PSDB; AAU96986. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 
PT acid encoding the polypeptide, useful for treating sitosterolemia, 
PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 45-46; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 
CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 



CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer *s 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence encodes the rat ABCG5 protein of the invention. (Updated on 

CC 07-AUG-2003 to correct OS field.) 

XX 

SQ Sequence 2035 BP; 481 A; 533 C; 537 G; 484 T; 0 U; 0 Other; 

Query Match 7.3%; Score 194.4; DB 6; Length 2035; 

Best Local Similarity 54.0%; Pred. No. 2.5e-41; 

Matches 421; Conservative 0; Mismatches 356; Indels 3; Gaps 1; 

Qy 360 CATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAGCTC 419 

I I I I I I III I I II I I I II I I II I I I I I I I I I I I II 

Db 214 CCTCAAAGATGTCTCCTTGTACATCGAGAGTGGCCAGACCATGTGCATCTTAGGTAGCTC 273 

Qy 420 AGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAGGTCACGGCGGCAAGAT 47 9 

I I I I I I I I I I I I I I I I I I I I I I II II I I II M 

Db 274 AGGCTCAGGGAAAACCACGCTGCTGGACGCCATCTCTGGGAGGCTGCGGCGCACAGGGAC 333 

Qy 4 80 CAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGTGAGGAAGTG 539 

I I I I I I I I Mill I I I I III 

Db 334 CTTGGAAGGGGAAGTGTTTGTGAACGGCTGCGAGCTGCGCAGGGACCAGTTCCAAGACTG 393 

Qy 540 TGTGGCCCACGTGCGCCAGCAC7\ACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTT 599 

II I I I I I I III II III I I I I I I I I II I I I I I I I 

Db 394 CGTCTCCTACCTCCTGCAGAGCGATGTCTTTCTGAGCAGCCTCACGGTGCGGGAGACGCT 453 

Qy 600 GGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAG 659 

I III M I I I I I I I I I I I I II I III II I I I I 

Db 454 GAGATACACGGC GATGCTGGCTCTCCGCAGCAGCTCCGCGGACTTCTACGACAAGAA 510 

Qy 660 GGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAA 719 

I I I I II I II I I I I II I I I II I I I I I I I II I I I I I I 

Db 511 GGTAGAGGCAGTCCTGACAGAGCTGAGTCTGAGCCACGTGGCAGACCAAATGATCGGCAA 570 

Qy 720 CATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCT 779 

I I I II II I I I I I M I II M I MM II I I I MM 

Db 571 CTATAATTTTGGGGGGATTTCCAGTGGCGAGCGGCGCCGAGTGTCCATCGCAGCCCAACT 630 

Qy 780 CCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCAC 839 

II I I I I I I I I I II I I II II I I I M I I I I II I II II I II 

Db 631 CCTTCAGGACCCCAAGGTCATGATGCTTGACGAGCCAACCACAGGACTGGACTGCATGAC 690 



Qy 840 AGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCC7\AAGGCAACCGGCTGGTGCTCAT 899 

II I I III I I I I I I I I I I I I II I I I III I I 

Db 691 TGC7\AATCATATCGTCCTCCTCTTGGTCGAGCTGGCTCGCAGGAACCGCATTGTAATTGT 750 

Qy 900 CTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGAC 959 

I I I M I I I I I M I I I I I I I I I I Mill I I I I I I II I I I I 

Db 751 CACCATCCACCAGCCTCGCTCTGAGCTCTTCCACCACTTCGACAAAATTGCCATTCTGAC 810 

Qy 960 GTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCAT 1019 

I II I I .11 II III I I I M I I I I I I I I I 

Db 811 TTACGGAGAGTTGGTGTTCTGTGGCACGCCAGAGGAGATGCTCGGCTTCTTCAATAACTG 870 

Qy 1020 CGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACCAGCAT 107 9 

I I I II I I I I I II II I I I I I I I I I I I I I I I I M I I I I I I I 

Db 871 TGGTTACCCCTGTCCTGAACATTCCAATCCCTTTGATTTCTACATGGACTTGACATCGGT 930 

Qy 1080 TGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGC 1139 

I II I I I M I I I I I I I II II II II I III I I I I I 

Db 931 GGACACCCAAAGCAGAGAGCGAGAGATAGAGACGTACAAGCGAGTCCAGATGCTGGAATC 990 



RESULT 11 
ABK51684 

ID ABK51684 standard; DNA; 1915 BP. 
XX 

AC ABK51684; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE DNA encoding mouse ABCG5 protein. 
XX 

KW Mouse; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease; 

KW ds. 
XX 

OS Mus sp. 



XX 

FH Key Location/Qualifiers 

FT CDS 1. .1915 

FT /*tag= a 

FT /partial 

FT /product^ "Mouse ABCG5 protein" 

FT /transl_except= (pes: 1912. .1915, aa: LGIVIFKVRDYLISR) 

FT /note= "This sequence lacks a stop codon" 

XX 



FN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2 OOOUS-02 352 68P . 
XX 

PA (USSR ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 



XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR P-PSDB; AAU96985. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 42-43; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence encodes the mouse ABCG5 protein of the invention 

XX 

SQ Sequence 1915 BP; 453 A; 502 C; 484 G; 476 T; 0 U; 0 Other; 

Query Match 7.2%; Score 193.4; DB 6; Length 1915; 

Best Local Similarity 53.4%; Pred. No. 4.5e-41; 

Matches 42 9; Conservative 0; Mismatches 371; Indels 3; Gaps 1 

Qy 335 GCTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGC 394 

II I I I I I I II I I I I I I I I I I I I II I I I 

Db 182 GCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGCC 241 

Qy 395 AGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCA 454 

MM II II I I I I I I I II I I I II I II II III II I I I I I III 
Db 242 AGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATCT 301 

Qy 455 CTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCA 514 



Db 



302 



CCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGC 361 



Qy 



515 



GCTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCC 574 



Db 



362 



TGCGCAGGGACCAGTTCC7\AGACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTGA 421 



Qy 



575 



CCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCT 634 
II I I II I I I I I I I I I I I I I I III M I I I I I I III I 



Db 422 GCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGC GATGCTGGCCCTCTGCCGCA 478 

Qy 635 TCTCCCAGGCCCAGCGTGACAAi^AGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGC 694 

I I I I III I I I I I I II M I I II II I I I I I I I I I I I I I 

Db 479 GCTCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCC 538 

Qy 695 AGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCA 754 

I MM! I I I I I I I I I I I I I'll I II I I I I I 

Db 539 ACGTGGCGGACCT^AATGATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGC 598 

Qy 755 GGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAAC 814 

I I I I I I I I I I II I I I II I II II II I I I I I II I I I 

Db 599 GCCGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGC 658 

Qy 815 CCACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGG 874 

I I M I I I II I I I II I I I II I I III I I M I I I I I I 

Db 659 CAACCACAGGACTGGACTGCATGACTGCA7\ATCAAATTGTCCTTCTCTTGGCTGAGCTGG 718 

Qy 875 CCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGC 934 

I I I I I I I I I I I I I I I I II I I M I I I II I I I I I I I I I I I I I 

Db 719 CTCGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAAC 778 

Qy 935 TGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGC 994 

I I I I I I I I II I I I I I I I I I I I I I I 

Db 77 9 ACTTCGACAAAATTGCCATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGG 838 

Qy 995 ACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTG 1054 

I I I I I I I II I I I I I I I I I I I I I I M I MINI II 

Db 839 AGATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTTG 8 98 

Qy 1055 ACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCA 1114 

I I I I I I I I I I I I M I I I I I I I II I I I I I II I I II I I II 

Db 899 ATTTTTACATGGACTTGACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGT 958 

Qy 1115 GGGAGAAGGCTCAGTCACTCGCA 1137 

II I III I I I I 
Db 959 ACAAGCGAGTACAGATGCTGGAA 981 



RESULT 12 
AAD48880 

ID AAD48880 standard; DNA; 1959 BP. 
XX 

AC AAD48880; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Mouse ABCG5 DNA. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolaemia; gall stone; 

PCW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW mouse; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5; gene; ds . 
XX 

OS Mus sp. 
XX 



FH Key Location/Qualifiers 

FT CDS 1. .1591 

FT /*tag= a 

FT /product^ "mTVBCGS protein" 
XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR P-PSDB; A7VE31702. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 11; Page 73; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is mouse MCG5 DNA 
XX 

SQ Sequence 1959 BP; 468 A; 506 C; 495 G; 490 T; 0 U; 0 Other; 

Query Match 7.2%; Score 193.4; DB 7; Length 1959; 
Best Local Similarity 53.4%; Pred. No. 4.6e-41; 

Matches 429; Conservative 0; Mismatches 371; Indels 3; Gaps 1 



335 GCTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGC 394 

II I I I I I I II I I I I I I I I I I I II I I I I 

182 GCCAGCAGAAGTGGGACAGGC7\AATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGCC 241 

395 AGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCA 454 

MM M M II II II M I I M II I MM III II M II I III 

242 AGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATCT 301 

455 CTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCA 514 

III I I II III I III MM I I I II I 

302 CCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGC 361 



Qy 



515 GCTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCC 574 



I I I I III I II 1 1 II I III III III 

Db 362 TGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTGA 421 

Qy 575 CCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCT 634 

I I I I II I I II II I I I I I I II III II I I I I I I III I 
Db 422 GCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGC GATGCTGGCCCTCTGCCGCA 478 

Qy 635 TCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGC 694 

I II I III I I I I I I I I I I I I I I I I I I II I I I I I I II I 

Db 479 GCTCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCC 538 

Qy 695 AGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCA 754 

I I I I I I I I I I I I I I I I I I III I I I I I I I I 

Db 539 ACGTGGCGGACCAAATGATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGC 598 

Qy 7 55 GGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAAC 814 

I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I 

Db 599 GCCGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGC 658 

Qy 815 CCACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGG 874 

I I I I I I I I I I I I I I I I I II I I III I I I I I I I I I I 

Db 659 CAACCACAGGACTGGACTGCATGACTGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGG 718 

Qy 875 CCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGC 934 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I 

Db 719 CTCGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAAC 778 

Qy 935 TGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGC 994 

I I I I I I I M I I I I I I I I I I I I I I I 

Db 779 ACTTCGACAA7\ATTGCCATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGG 838 

Qy 995 ACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTG 1054 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 839 AGATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTTG 898 

Qy 1055 ACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCA 1114 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 899 ATTTTTACATGGACTTGACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGT 958 

Qy 1115 GGGAGAAGGCTCAGTCACTCGCA 1137 

II I III I I I I 

Db 959 ACAAGCGAGTACAGATGCTGGAA 981 



RESULT 13 
i\AD22008 

ID AAD22008 standard; DNA; 2258 BP. 
XX 

AC AAD22008; 
XX 

DT 12-FEB-2002 {first entry) 
XX 

DE Mouse sitosterolaeinia susceptibility gene (SSG) . 
XX 

KW Mouse; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolaemia; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 



KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 17; ds . 
XX 

OS Mus sp . 
XX 

FH Key Location/Qualif iers 

FT CDS 47. ,2005 

FT /*tag= a 

FT /product^ "Mouse SSG protein" 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 

PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 

DR P-PSDB; A7^13289. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Claim 8; Fig 7; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is mouse SSG DNA. Mouse SSG is located on chromosome 17 

XX 

SQ Sequence 2258 BP; 549 A; 579 C; 567 G; 563 T; 0 U; 0 Other; 

Query Match 7.2%; Score 193.4; DB 6; Length 2258; 
Best Local Similarity 53.4%; Pred. No. 4.9e-41; 

Matches 429; Conservative 0; Mismatches 371; Indels 3; Gaps 1 

Qy 335 GCTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGC 394 

II I I II I I II I I I I I I I II I I I I I I I I 

Db 228 GCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGCC 287 



Qy 395 AGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCA 454 

I I I I II I I I I I I I I I I I I I I I I I I I I I III M I I I I I III 
Db 288 AGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATCT 347 

Qy 455 CTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCA 514 

III I I II III I III II I I I II I I I 

Db 348 CCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGC 4 07 

Qy 515 GCTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCC 574 

I I I I III I I I I I I I I III III III 

Db 408 TGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTGA 4 67 

Qy 575 CCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAG7!u\CCT 634 

I I I I I I I I I I I I I I I I I I I I III II I I I I I I III I 
Db 4 68 GCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGC GATGCTGGCCCTCTGCCGCA 524 

Qy 635 TCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGC 694 

I I I I III I I I I I II I I I I I I I I I I I I I I M I I I I I I 

Db 525 GCTCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCC 584 

Qy 695 AGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCA 754 

I Mill I I I II I I I I I I I I II I I I I I I I I 

Db 585 ACGTGGCGGACCAAATGATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGC 644 

Qy 755 GGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGT^C 814 

I I I I I I I I I I M I I I I I I I I I I I I I I I II I I II I 

Db 645 GCCGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGC 704 

Qy 815 CCACCTCTGGGCTCGACAGCTTCACAGCCCAC7U\.CCTGGTGAAGACCTTGTCCAGGCTGG 874 

I I II I I I I I I I I II I I I I I I I III I I I I I I I I I I 

Db 705 CAACCACAGGACTGGACTGCATGACTGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGG 764 

Qy 875 CCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGC 934 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 

Db 765 CTCGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAAC 824 

Qy 935 TGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGC 994 

I I I I I I I I I I I I I I I I II I I I I I I 

Db 825 ACTTCGACAAAATTGCCATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGG 884 

Qy 995 ACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTG 1054 

I I II I I I I I I I I I I II I I I I II I I I I I I I I I I II 

Db 885 AGATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTTG 944 

Qy 1055 ACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCA 1114 

I I I I I I I I II I I I I I I I I I I M I I I I I I I I I I I I I I II 

Db 945 ATTTTTACATGGACTTGACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGT 1004 

Qy 1115 GGGAGAAGGCTCAGTCACTCGCA 1137 

II I Ml I I I I 
Db 1005 ACAAGCGAGTACAGATGCTGGAA 1027 



RESULT 14 
ABK51685 

ID ABK51685 standard; cDNA; 2354 BP. 



XX 

AC ABK51685; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Mouse ABCG5 cDNA sequence. 
XX 

KW Mouse; ABCG5; ATP-binding cassette gene 5; sitosterolemia ; cholesterol; 

KW arteriosclerosis; heart disease; hyper sterolemia; Alzheimer's disease; 

KW ss . 
XX 

OS Mus sp. 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 , 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HE7VLTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 45; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of 7VBCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or TVlzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence represents the cDNA sequence of the mouse ABCG5 gene of the 

CC invention 

XX 



SQ Sequence 2354 BP; 573 A; 604 C; 594 G; 583 T; 0 U; 0 Other; 



Query Match 7.2%; Score 193.4; DB 6; Length 2354; 

Best Local Similarity 53.4%; Pred. No. 5e-41; 

Matches 429; Conservative 0; Mismatches 371; Indels 3; Gaps 1; 

Qy 335 GCTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGC 394 

II I I I II I II I I I I I I II I I I I I I I I I 

Db 320 GCCAGCAGAAGTGGGACAGGCAAATCCTCA7\AGATGTCTCCTTGTACATCGAGAGTGGCC 379 

Qy 395 AGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCA 454 

Mil II II I I I I I I I I I I I I I I I I I I I III I I I I II I III 

Db 380 AGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATCT 439 

Qy 455 CTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCA 514 

III I I'll III I III I I I I I M I I I 

Db 440 CCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGC 499 

Qy 515 GCTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCC 574 

I I I I III I I I I I I I I III III III 

Db 500 TGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTGA 559 

Qy 575 CCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCT 634 

I I I I I I M II I I I I M I I I I III II I I I II I I I I I 

Db 560 GCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGC GATGCTGGCCCTCTGCCGCA 616 

Qy 635 TCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGC 694 

I I I I III I I I I I I I I I I I I I I II I II Ml I I M II I 

Db 617 GCTCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCC 676 

Qy 695 AGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCA 754 

I I I I I I I I I I I I I I MM III I I I I II I I 

Db 677 ACGTGGCGGACCAAATGATTGGCAGCTAT7\ATTTTGGGGGAATTTCCAGTGGCGAGCGGC 736 

Qy 755 GGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGG7\ATCCTTATTCTCGACGAAC 814 

I I I II MM I I I I II I I I I I II I I I II I I II M I 

Db 737 GCCGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGC 796 

Qy 815 CCACCTCTGGGCTCGACAGCTTCACAGCCCACTyVCCTGGTGAAGACCTTGTCCAGGCTGG 874 

I I I I I I I I I II I I I I I I M I I III Mill II II I 

Db 797 CAACCACAGGACTGGACTGCATGACTGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGG 856 

Qy 875 CCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGC 934 

I I I II I Mill II II II I I II II I I II II II M I II I II I 

Db 857 CTCGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAAC 916 

Qy 935 TGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGC 994 

II I I I II II I II II I I I II 

Db 917 ACTTCGACTWy^TTGCCATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGG 976 

Qy 995 ACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTG 1054 

II II I I I I II I II M M II II M II I I I II I I II 

Db 977 AGATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCCTGAACATTCC7\ATCCCTTTG 1036 

Qy 1055 ACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCA 1114 

I II II M I I II II I Mill I II M II I I II II M II II 

Db 1037 ATTTTTACATGGACTTGACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAG7WVCGT 1096 



Qy 1115 GGGAGAAGGCTCAGTCACTCGCA 1137 

II I III I I I I 
Db 1097 ACT^GCGAGTACAGATGCTGGAA 1119 

RESULT 15 
ABK51687 

ID ABK51687 standard; cDNA; 1069 BP. 
XX 

AC ABK51687; 
XX 

DT 07-AUG-2003 (revised) 

DT 30-JUL-2002 (first entry) 

XX 

DE cDNA encoding hamster ABCG5 protein. 
XX 

KW Hamster; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia ; Alzheimer's disease; 

KW ss. 
XX 

OS Cricetinae. 



XX 

FH Key Location/Qualifiers 

FT CDS 30. .1049 

FT /*tag= a 

FT /partial 

FT /product= "Hamster ABCG5 protein" 

FT /note= "This sequence lacks both a start and stop codon" 

XX 



FN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSR ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR P-PSDB; AAU96987. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 47; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 



CC compound which alters 7VBCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present nucleic 

CC acid sequence encodes the hamster ABCG5 protein of the invention. 

CC (Updated on 07-AUG-2003 to correct OS field.) 

XX 

SQ Sequence 1069 BP; 266 A; 282 C; 273 G; 248 T; 0 U; 0 Other; 

Query Match 6.5%; Score 173.2; DB 6; Length 1069; 

Best Local Similarity 54.1%; Pred. No. 9.2e-36; 

Matches 397; Conservative 0; Mismatches 333; Indels 4; Gaps 2; 

Qy 418 TCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAGGTCACGGCGGCAAG 477 

I I I I I I I I I I I M I I I I I I I I I I I I I I I I I. II I 

Db 1 TCAGGCTCAGGGAAAACCACGTTGCT-GGTGCCATCTCCGGGAGGCTGCGACGCACAGGG 59 

Qy 478 ATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGTGAGGAAG 537 

I I I I I I I I I I I I I I I I I I I I I 

Db 60 ACCCTGGAAGGGGAGGTGTTTGTGAACGGCCGTGAGCTGCGCAGGGACCAGTTCCAAGAC 119 

Qy 538 TGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACC 597 

II I I M III III I II M I II I 

Db 120 TGCTTCTCCTATGTCCTGCAGAGCGACGTCTTTCTGAGCAGTCTCACGGTGCGAGAGACG 179 

Qy 598 TTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACA7UV 657 

II I I I I II III I I I I I I I I I M III I I I I I I 

Db 180 CTGCGCTACACGGCGATGCTGGCCCTCCGCAGTAGCTCTTC GGACTTCTATGACAAG 236 

Qy 658 AGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGC 717 

I I I I II II I I I I I Mill I I I I I I I Mill I I II 

Db 237 AAGGTAGAGGCAGTCATGGAAGAGCTAAGTCTGAGCCACGTGGCAGACCGAATGATTGGC 296 

Qy 718 7UVCATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAG 777 

II I I I I II I III I I I II I M I Mill MM I II 

Db 297 AACTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGCGCCGAGTCTCCATCGCAGCCCAA 356 

Qy 778 CTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTC 837 

II I I I II M I II I II I II II II M I I II II I I I M I 

Db 357 CTCATTCAGGACCCCAAGATCATGATGTTTGATGAGCCAACCACAGGACTGGACTGCATG 416 

Qy 838 ACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTC 897 

I I I I I I I I I I I I I I II I I II I MM I M I II 

Db 417 ACTGCAAATCAAATTGTCATCCTCCTGGCAGAGCTGGCTCGCAGGGACCGCATTGTGATC 476 



Qy 



898 ATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATG 957 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I MM I II M 

Db 477 GTCACCATCCACCAGCCTCGCTCTGAGCTCTTTCAACACTTCGACAAAATTGCCATCCTG 536 

Qy 958 ACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCC 1017 

III II I I I I II III II I I II M I I II II I 

Db 537 ACTTACGGAGAGATGGTGTTCTGTGGCACGCCGGAGGAAATGCTCGACTTCTTCAATAGC 596 

Qy 1018 ATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACCAGC 1077 

II I I I I I I I II I I I I II M II I II M II I II II II II 

Db 597 TGTGGTTACCCTTGTCCTGAACATTCCAACCCCTTTGACTTCTACATGGACTTGACATCA 656 

Qy 1078 ATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCA 1137 

I I I I I II II II II II I II I I III III I III I I I I I 

Db 657 GTGGATACCCAGAGCAGAGAGCGAGAAATAGAAACCTACAAGAGAGTCCAGATGCTCGAA 716 

Qy 1138 GCCCTGTTTCTAGA 1151 

I II III 

Db 717 TCTGCCTTCAGAGA 730 
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Job time : 680.985 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 
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Title: 
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Sequence: 

Scoring table: 
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(without alignments) 
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Searched: 682709 seqs, 277475446 residues 
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Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
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ALIGNMENTS 



RESULT 1 
US-09-245-808-2 

; Sequence 2, Application US/09245808 
; Patent No. 6313277 
; GENERAL INFORMATION: 
; APPLICANT: Doyle, L. Austin 
APPLICANT: Abruzzo, Lynne V. 
; APPLICANT: Ross, Douglas D. 

; TITLE OF INVENTION: Breast Cancer Resistance Protein (BCRP) and DNA which 

; TITLE OF INVENTION: encodes it 

; FILE REFERENCE: Ross UMb conversion 

; CURRENT APPLICATION NUMBER: US/09/245, 808 

; CURRENT FILING DATE: 1999-02-05 

; E7VRLIER APPLICATION NUMBER: 60/073763 

; EARLIER FILING DATE: 1998-02-05 

; NUMBER OF SEQ ID NOS : 7 

; SOFTWARE: Patentin Ver. 2.0 

; SEQ ID NO 2 



LENGTH: 2418 
TYPE: DNA 

ORGANISM: Human MCF-7/AdrVp cells 
US-09-245-808-2 

Query Match 4.3%; Score 115.4; DB 4; Length 2418; 

Best Local Similarity 51.2%; Pred. No. 4.7e-19; 

Matches 269; Conservative 0; Mismatches 256; Indels 0; Gaps 0; 

Qy 548 ACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTGGCCTTCA 607 

I I I I I II I I III II I I II I I I I II I I I I I III 

Db 606 ACGTGGTACAAGATGATGTTGTGATGGGCACTCTGACGGTGAGAGAAAACTTACAGTTCT 665 

Qy 608 TTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGG 667 

II I I I I II I I I I I I I I I I I I I II I I 

Db 666 CAGCAGCTCTTCGGCTTGCAACAACTATGACGAATCATGAAAAAAACGAACGGATTAACA 725 

Qy 668 ACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAACATGTACG 727 

I I I I I II I III I I I II I I I I I II I M 

Db 726 GGGTCATTCAAGAGTTAGGTCTGGATAAAGTGGCAGACTCCAAGGTTGGAACTCAGTTTA 7 85 

Qy 728 TGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGA 7 87 

I I I I I I I I I I I I I I I II II I I M I I I I I I I I I 

Db 786 TCCGTGGTGTGTCTGGAGGAGAAAGAAAAAGGACTAGTATAGGAATGGAGCTTATCACTG 845 

Qy 788 ACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACAGCCCACA 847 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 846 ATCCTTCCATCTTGTTCTTGGATGAGCCTACAACTGGCTTAGACTCAAGCACAGCAAATG 905 

Qy 84 8 ACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATCTCCCTCC 907 

I I I III I I I I I I I I II I I I I I I II I I 

Db 906 CTGTCCTTTTGCTCCTGAAAAGGATGTCTAAGCAGGGACG7\ACAATCATCTTCTCCATTC 965 

Qy 908 ACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACGTCTGGCA 967 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I 

Db 966 ATCAGCCTCGATATTCCATCTTCAAGTTGTTTGATAGCCTCACCTTATTGGCCTCAGGAA 1025 

Qy 968 CCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATCGGCTACC 1027 

I I I I I I I I I I I I I I I I I I I I II I I I I I 

Db 1026 GACTTATGTTCCACGGGCCTGCTCAGGAGGCCTTGGGATACTTTGTiATCAGCTGGTTATC 1085 

Qy 1028 CCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGA 1072 

II I I I I I I I I I I II I I I I I I I I I I I I I I I 

Db 108 6 ACTGTGAGGCCTATAATAACCCTGCAGACTTCTTCTTGGACATCA 1130 



RESULT 2 

US-09-252-991A-14368/C 

; Sequence 14368, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield et al . 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE; 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252, 991A 



CURRENT FILING DATE: 1999-02-18 
PRIOR APPLICATION NUMBER: US 60/074,788 
PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER: US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
NUMBER OF SEQ ID NOS : 33142 
SEQ ID NO 14368 
LENGTH: 747 
TYPE: DNA 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-14368 

Query Match 2.3%; Score 61.6; DB 4; Length 747; 

Best Local Similarity 47.6%; Pred. No. 7.6e-06; 

Matches 273; Conservative 0; Mismatches 274; Indels 27; Gaps 2 

Qy 357 GGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAG 416 

II I I I I II III Ml I II I I I I I M I I M M I I I 

Db 732 GGTGGTCAAGGGCGTCGACCTGAGGGTGGACAAGGGCGAGGTGCTGTCGATCATCGGCGG 673 

Qy 417 CTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAG GTCACGG 470 

I I I I I I I III I I I I I I I I I I I I I I I I II I 

Db 672 CTCCGGTTCCGGCAAGTCGACCCTGCTGATGTGCATCAACGGCCTGGAGCCGATCCAGCG 613 

Qy 471 CGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGT 530 

I I I I I I I I II III I I I I I II I II I 

Db 612 CGGCAGCATCCGCGTCGACGGCATCGACGTGCATGCCCGCGGTACCGACCTCAACCGCCT 553 

Qy 531 GAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCG 590 

III I III III I I I II I I I I I I I I I I II I II I I 

Db 552 GCGGCGGAAGATCGGCATCGTCTTCCAGCAGTGGAACGCCTTCCCCCACCTGACCGTGCT 4 93 

Qy 591 AGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCG 650 

Mill II II II III II II I II I II I 
Db 4 92 GGTWUVCGTCATGCTCGCGCCGCGCT^AGGTGCTCGGCAAGAGCCGCGCCGAAGCC 4 38 

Qy 651 TGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCG 710 

II I M I I I I I II III II III I I 
Db 437 GAGGCGATGGCGCTGAAGCAACTCACCCACGTCGGTCTCGGCGA 394 

Qy 711 CGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGG 770 

I I II I I I I I I I I II I I I I I II III III 

Db 393 CAAGCTCAAGGTCTTCCCCCAGCGCCTTTCCGGCGGCCAGCAACAGCGCATGGCGATCGC 334 

Qy 771 GGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGA 830 

I III I I I I I I II I I I I II I II II I I I I I I I I II II 
Db 333 CCGGGCGCTGGCGATGTCGCCGGAATACATGCTGTTCGACGAAGCCACCTCGGCGCTCGA 274 

Qy 831 CAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCT 8 90 

I I I II I I II I I II I I I I I II I I II I I I I 

Db 273 CCCGCAGTTGGTCGGCGAGGTGGTGGACACCATGCGCATGCTCGCCGAGGAAGGCATGAC 214 

Qy 8 91 GGTGCTCATCTCCCTCCACCAGCCTCGCTCTGAC 924 

I I I I I I I I I I II II I I II 

Db 213 CATGGTCCTGGTCACCCACGAGATCCGCTTCGCC 180 



RESULT 3 

US-09-252-991A-14337 

Sequence 14337, Application US/09252991A 
Patent No. 6551795 
GENERAL INFORMATION: 
APPLICANT: Marc J. Rubenfield et al. 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 107196.136 

CURRENT APPLICATION NUMBER: US/09/252, 991A 
CURRENT FILING DATE: 1999-02-18 
PRIOR APPLICATION NUMBER: US 60/074,788 
PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER: US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
NUMBER OF SEQ ID NOS : 33142 
SEQ ID NO 14337 
LENGTH: 795 
TYPE : DNA 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-14337 

Query Match 2.3%; Score 61.6; DB 4; Length 795; 

Best Local Similarity 47.6%; Pred. No. 7.8e-06; 

Matches 273; Conservative 0; Mismatches 274; Indels 27; Gaps 2; 
Qy 357 GGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAG 416 

II MM II III III I 11 M M II I I I I M I M I 

Db 94 GGTGGTC7\AGGGCGTCGACCTGAGGGTGGACAAGGGCGAGGTGCTGTCGATCATCGGCGG 153 
Qy 417 CTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAG GTCACGG 470 

I II M I I Ml I II M I I II I I I II I I II I 

Db 154 CTCCGGTTCCGGCAAGTCGACCCTGCTGATGTGCATCAACGGCCTGGAGCCGATCCAGCG 213 

Qy 471 CGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGT 530 

II II I II I II III I II I I II I II I 

Db 214 CGGCAGCATCCGCGTCGACGGCATCGACGTGCATGCCCGCGGTACCGACCTCAACCGCCT 273 

Qy 531 GAGGT^GTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCG 590 

III I III III I I II II I I II I I M I I M I II I 

Db 274 GCGGCGGAAGATCGGCATCGTCTTCCAGCAGTGGAACGCCTTCCCCCACCTGACCGTGCT 333 

Qy 591 AGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCG 650 

1 1 I I I II II II I II II II I II I 1 1 1 

Db 334 GGAATUVCGTCATGCTCGCGCCGCGCAAGGTGCTCGGCAAGAGCCGCGCCGAAGCC 388 

Qy 651 TGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCG 710 

II I I M I I I I M III II III I I 

Db 389 GAGGCGATGGCGCTGAAGCAACTCACCCACGTCGGTCTCGGCGA 432 

Qy 711 CGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGG 770 

I I III III III I I I I I M III II I Ml 

Db 433 CAAGCTCAAGGTCTTCCCCCAGCGCCTTTCCGGCGGCCAGCAACAGCGCATGGCGATCGC 492 



Qy 



771 GGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGA 830 

I III I I I I I I II I II II I M I I I I M II I II I II I 



Db 4 93 CCGGGCGCTGGCGATGTCGCCGGAATACATGCTGTTCGACGAAGCCACCTCGGCGCTCGA 552 

Qy 831 CAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCT 890 

I II II I I I I I I I II II I I I I I I II I I I I 

Db 553 CCCGCAGTTGGTCGGCGAGGTGGTGGACACCATGCGCATGCTCGCCGAGGAAGGCATGAC 612 

Qy 891 GGTGCTCATCTCCCTCCACCAGCCTCGCTCTGAC 924 

I I I I I I I I I I I I I I I I II 

Db 613 CATGGTCCTGGTCACCCACGAGATCCGCTTCGCC 64 6 



RESULT 4 

US-09-252-991A-14340/C 

Sequence 14340, Application US/09252991A 
Patent No. 6551795 
GENERAL INFORMATION: 
APPLICANT: Marc J. Rubenfield et al . 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 107196.136 

CURRENT APPLICATION NUMBER: US/ 09/252 , 991A 
CURRENT FILING DATE: 1999-02-18 
PRIOR APPLICATION NUMBER: US 60/074,788 
PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER: US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
NUMBER OF SEQ ID NOS : 33142 
SEQ ID NO 14340 
LENGTH: 1311 
TYPE: DNA 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-1434 0 

Query Match 2.3%; Score 61.6; DB 4; Length 1311; 

Best Local Similarity 47.6%; Pred. No. 9.5e-06; 

Matches 273; Conservative 0; Mismatches 274; Indels 27; Gaps 2; 

Qy 357 GGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAG 416 

II I I II II III III I II I I I I I II I I I I I I II I 

Db 1207 GGTGGTCAAGGGCGTCGACCTGAGGGTGGACAAGGGCGAGGTGCTGTCGATCATCGGCGG 1148 

Qy 417 CTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAG GTCACGG 470 

I I I I I I I III I I I I I I I I I I I I I I I I II I 

Db 1147 CTCCGGTTCCGGCAAGTCGACCCTGCTGATGTGCATCAACGGCCTGGAGCCGATCCAGCG 1088 

Qy 471 CGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGT 530 

I I I I I I I I II III I I I I I II I II I 

Db 1087 CGGCAGCATCCGCGTCGACGGCATCGACGTGCATGCCCGCGGTACCGACCTCAACCGCCT 1028 

Qy 531 GAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCG 590 

III I III III I I I I I I I I I I I I I I I I I I I I I I 

Db 1027 GCGGCGGAAGATCGGCATCGTCTTCCAGCAGTGGAACGCCTTCCCCCACCTGACCGTGCT 968 

Qy 591 AGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCG 650 

I I I I I II II II III II II I I I I I I I 
Db 967 GGA7WVCGTCATGCTCGCGCCGCGCAAGGTGCTCGGCAAGAGCCGCGCCGAAGCC 913 



Qy 651 TGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCG 710 

I I I I I I I I I I II III II III I I 
Db 912 GAGGCGATGGCGCTGAAGCAACTCACCCACGTCGGTCTCGGCGA 869 

Qy 711 CGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGG 770 

I I I I I III III I I I I I II III II I III 

Db 868 CAAGCTCAAGGTCTTCCCCCAGCGCCTTTCCGGCGGCCAGCAACAGCGCATGGCGATCGC 809 

Qy 771 GGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACG7\ACCCACCTCTGGGCTCGA 830 

I III I I I I I I II I I I I II I II I I I I I I I I I I I I M 
Db 808 CCGGGCGCTGGCGATGTCGCCGGAATACATGCTGTTCGACGAAGCCACCTCGGCGCTCGA 74 9 

Qy 831 CAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCT 8 90 

I II II I I I I I I I II I I I I I I I M I I I I I 

Db 748 CCCGCAGTTGGTCGGCGAGGTGGTGGACACCATGCGCATGCTCGCCGAGGAAGGCATGAC 689 

Qy 891 GGTGCTCATCTCCCTCCACCAGCCTCGCTCTGAC 924 

I I I I I I I I I I I I I I I I II 
Db 688 CATGGTCCTGGTCACCCACGAGATCCGCTTCGCC 655 



RESULT 5 

US-09-252-991A-14279 

; Sequence 14279, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield et al. 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/ 09/252 , 991A 

; CURRENT FILING DATE: 1999-02-18 

; PRIOR APPLICATION NUMBER: US 60/074,788 

; PRIOR FILING DATE: 1998-02-18 

; PRIOR APPLICATION NUMBER: US 60/094,190 

; PRIOR FILING DATE: 1998-07-27 

; NUMBER OF SEQ ID NOS: 33142 

; SEQ ID NO 14279 

LENGTH: 1374 

TYPE: DNA 
; ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-14279 

Query Match 2.3%; Score 61.6; DB 4; Length 1374; 

Best Local Similarity 47.6%; Pred. No. 9.7e-06; 

Matches 273; Conservative 0; Mismatches 274; Indels 27; Gaps 2; 

Qy 357 GGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAG 416 

II I I I I II III III I II I I I I I I I I I M M I I I 
Db 696 GGTGGTCAAGGGCGTCGACCTGAGGGTGGACAAGGGCGAGGTGCTGTCGATCATCGGCGG 755 

Qy 417 CTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAG GTCACGG 470 

I I I I I I I III I I I I I I I I I I I MM I M I 

Db 756 CTCCGGTTCCGGCAAGTCGACCCTGCTGATGTGCATCAACGGCCTGGAGCCGATCCAGCG 815 



Qy 471 CGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGT 530 

I I I I I I I I II III I II I I II I II I 

Db 816 CGGCAGCATCCGCGTCGACGGCATCGACGTGCATGCCCGCGGTACCGACCTCAACCGCCT 875 

Qy 531 GAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCG 590 

III I III III I I I I I I I I I II I I I II II I I II 

Db 87 6 GCGGCGGAAGATCGGCATCGTCTTCCAGCAGTGGAACGCCTTCCCCCACCTGACCGTGCT 935 

Qy 591 AGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCG 650 

Mill II II II III II II I I I I I II 
Db 936 GGAAAACGTCATGCTCGCGCCGCGCAAGGTGCTCGGCAAGAGCCGCGCCGT^GCC 990 

Qy 651 TGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCG 710 

I I I I I I I I I I II Ml II Ml I I 
Db 991 GAGGCGATGGCGCTG7\AGCAACTCACCCACGTCGGTCTCGGCGA 1034 

Qy 711 CGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGG 770 

I I III III III I II II I I III II I III 

Db 1035 CAAGCTCAAGGTCTTCCCCCAGCGCCTTTCCGGCGGCCAGCAACAGCGCATGGCGATCGC 1094 

Qy 771 GGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGA 830 

I III I I I I I I II I I I I I I I M I I I M I I I I I I II I 
Db 1095 CCGGGCGCTGGCGATGTCGCCGGAATACATGCTGTTCGACGAAGCCACCTCGGCGCTCGA 1154 

Qy 831 CAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCT 8 90 

I II II I I I II I I I I II I I II I I II I I I I 

Db 1155 CCCGCAGTTGGTCGGCGAGGTGGTGGACACCATGCGCATGCTCGCCGAGGAAGGCATGAC 1214 

Qy 891 GGTGCTCATCTCCCTCCACCAGCCTCGCTCTGAC 924 

I I II I I M II I I II II II 

Db 1215 CATGGTCCTGGTCACCCACGAGATCCGCTTCGCC 1248 



RESULT 6 

US-09-614-912-139 

Sequence 139, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/09/614,912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 



; PRIOR FILING DATE: 1999-07-30 

; PRIOR APPLICATION NUMBER: 60/170,906 

; PRIOR FILING DATE: 1999-12-15 

; PRIOR APPLICATION NUMBER: 60/172,959 

; PRIOR FILING DATE: 1999-12-21 

; PRIOR APPLICATION NUMBER: 60/172,946 

; PRIOR FILING DATE: 1999-12-21 

; NUMBER OF SEQ ID NOS : 204 

; SOFTWARE: Microsoft Office 97 

; SEQ ID NO 139 

LENGTH: 4159 

TYPE: DNA 

ORGANISM: Oryza sativa 
US-09-614-912-139 

Query Match 2.3%; Score 61.6; DB 4; Length 4159; 

Best Local Similarity 49.5%; Pred. No. 1.5e-05; 

Matches 188; Conservative 0; Mismatches 189; Indels 3; Gaps 1; 

Qy 697 TGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGG 756 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 411 TGCGCGGACACGATCGTCGGCGACCAGATGCAGAGGGGGATCTCCGGTGGTCAGAAGAAA 470 

Qy 757 AGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCC 816 

I I I I I I I I I I I II I I II III I I I I I I 

Db 471 CGCGTCACCACCGGTGAGATGATTGTCGGTCCAACAAAGGTTCTATTCATGGATGAGATA 530 

Qy 817 ACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTT GTCCAGGCTG 873 

I I I I I I I M I I I I I II I II I I I I I II I I II 
Db 531 TCAACTGGATTGGACAGCTCCACCACATTCCAGATTGTCAAATGCCTTCAGCAAATCGTG 590 

Qy 874 GCCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGG 933 

I I I I I I II I I I II I II I MM I M II I II I 

Db 591 CACTTGGGCGAGGCAACCATCCTCATGTCACTCCTACAACCAGCCCCTGAGACTTTTGAG 650 

Qy 934 CTGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAG 993 

I I I I I I I I I II I II I Ml III III II 

Db 651 CTATTCGATGACATTATCCTACTGTCAGAAGGCCAGATTGTTTATCAGGGACCCCGCGAA 710 

Qy 994 CACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCT 1053 

III I III II I II I I I I I I I II I I I II I 

Db 711 TACGTCCTTGAGTTCTTTGAGTCATGCGGATTCCGCTGCCCAGAGCGTAAGGGTACTGCA 770 

Qy 1054 GACTTCTATGTGGACCTGAC 1073 

II II I I III I II I 

Db 771 GACTTTCTTCAGGAGGTGAC 790 



RESULT 7 

US-09-614-912-143 

Sequence 143, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLIC7\NT 



Allen, Steve 
Rafalski, Antoni 
Orozco, Buddy 
Miao, Gou-Hau 



APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/09/614,912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 
PRIOR FILING DATE: 1999-07-30 
PRIOR APPLICATION NUMBER: 60/170,906 
PRIOR FILING DATE: 1999-12-15 
PRIOR APPLICATION NUMBER: 60/172,959 
PRIOR FILING DATE: 1999-12-21 
PRIOR APPLICATION NUMBER: 60/172,946 
PRIOR FILING DATE: 1999-12-21 
NUMBER OF SEQ ID NOS : 204 
S0FTW7VRE: Microsoft Office 97 
SEQ ID NO 143 
LENGTH: 1977 
TYPE: DNA 

ORGANISM: Triticum aestivum 
US-09-614-912-143 

Query Match 2.3%; Score 60.8; DB 4; Length 1977; 

Best Local Similarity 47.2%; Pred. No. 1.8e-05; 

Matches 185; Conservative 0; Mismatches 207; Indels 0; Gaps 0; 

Qy 574 CCCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACC 633 

II I I I I I I II I I I II III I I II I II I I I I 

Db 110 CCACATGTGACAATATATGAATCACTCGTATTTTCTGCATGGCTGCGGCTTCCTGCAGAG 169 

Qy 634 TTCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGG 693 

I I I III I I I I I I I II I I I I I I 

Db 170 GTTGACTCAGAAAGAAGAAAGATGTTCATCGAGGAGATCATGGATCTTGTAGAGCTCACA 229 

Qy 694 CAGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGC 753 

I I I I I I I I I III I I I I I I I I I I 

Db 230 TCATTGAGGGGGGCACTTGTTGGGCTCCCTGGAGTGAATGGTCTATCAACTGAGCAACGC 289 

Qy 754 AGGAGAGTCAGCATTGGGGTGCAGCTCCTGTGG7\ACCCAGGAATCCTTATTCTCGACGAA 813 

I I I I II MM II I I I I I I I I I II I I I I I I I I I I I I 

Db 290 AAGAGGCTTACAATTGCCGTGGAGCTTGTTGCTAACCCGTCGATCATTTTTATGGATGAG 349 

Qy 814 CCCACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTG 873 

M I I I I I M I I I I II I I I II I I I I M I I 

Db 350 CCAACATCTGGTCTTGATGCTCGTGCAGCTGCAATTGTGATGAGGACTGTTAGGAACACT 409 

Qy 874 GCCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGG 933 

I II III II I I II I I I I I M I I I I II I I I I I I 

Db 410 GTTAACACTGGCAGGACCGTTGTTTGCACCATCCACCAGCCAAGTATTGACATATTTGAA 469 



Qy 934 CTGTTTGATCTGGTCCTCCTGATGACGTCTGG 965 

I I I I I I II II II II II I II 

Db 47 0 GCATTTGATGAGCTTTTCTTGATGAAGAGAGG 501 



RESULT 8 

US-09-620-312D-918 

Sequence 918, Application US/09620312D 
Patent No. 6569662 
GENERAL INFORMATION: 
APPLICANT: Tang, Y. Tom 
APPLICANT: Liu, Chenghua 
APPLICANT: Asundi, Vinod 
APPLICANT: Zhang, Jie 
APPLICANT: Ren, Feiyan 
APPLICANT: Chen, Rui-hong 
APPLICANT: Zhao, Qing A. 
APPLICANT: Wehrman, Tom 
APPLICANT: Xue, Aidong J. 
APPLICANT: Yang, Yonghong 
APPLICANT: Wang, Jian-Rui 
APPLICANT: Zhou, Ping 
APPLICANT: Ma, Yunqing 
APPLICANT: Wang, Dunrui 
APPLICANT: Wang, Zhiwei 
APPLICANT: John Tillinghast 
APPLICANT: Drmanac, Radoje T. 

TITLE OF INVENTION: No. 6569662el Nucleic Acids and 
TITLE OF INVENTION: Polypeptides 
FILE REFERENCE: 784CIP2B 

CURRENT APPLICATION NUMBER: US/09/620, 312D 
CURRENT FILING DATE: 2000-07-19 
PRIOR APPLICATION NUMBER: 09/552,317 
PRIOR FILING DATE: 2000-04-25 
PRIOR APPLICATION NUMBER: 09/488,725 
PRIOR FILING DATE: 2000-01-21 
NUMBER OF SEQ ID NOS : 1105 
SOFTWARE: pt_FL_genes Version 1.0 
SEQ ID NO 918 
LENGTH: 337 6 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE: 
NAME/ KEY: CDS 
LOCATION: (1) . . (2808) 
US-09-620-312D-918 

Query Match 2.1%; Score 55.8; DB 4; Length 337 6; 

Best Local Similarity 47.7%; Pred. No. 0.00038; 

Matches 340; Conservative 0; Mismatches 322; Indels 51; Gaps 4 

Qy 362 TCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAGCTCAG 421 

II II III I II I I I II II II I I I I II I I I I I I I II 

Db 71 TCAAGTGCCTCTCAGGTAAATTCTGCCGCCGGGAGCTGATTGGCATCATGGGCCCCTCAG 130 



422 GTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAGGTCACGGCGGCTVAGATCA 481 



Db 131 GGGCTGGCAAGTCTACATT CATGAACATCTTGGCAGGATACAGGGAGTCTGGAA 184 

Qy 4 82 AGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGTGAGG7\AGTGTG 541 

I I I I I I I I I I I I I I I I Ml I II I I I I II 

Db 185 TGAAGGGGCAGATCCTGGTTAATGGAAGGCCACGGGAGCTGAGGACCTTCCGCAAGATGT 244 

Qy 542 TGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTGG 601 

I I I I II I II II I I I I II II I I I III II I I I I 

Db 245 CCTGCTACATCATGCAAGATGACATGCTGCTGCCGCACCTCACGGTGTTGGAAGCCATGA 304 

Qy 602 CCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGGG 661 

I I I I I I II II II I I I II I II II 

Db 305 TGGTCTCTGCTAACCTGAAGCTGAGTGAGA AGCAGGAGGTGAAGAAGGAGCTGG 358 

Qy 662 TGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAACA 721 

II II I I III I I I I I I I I I I I I I I II I I II 

Db 359 TGACAGAGATCCTGACGGCACTGGGCCTGATGTCGTGCTCCCACACGAGGACAGCCC 415 

Qy 722 TGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTCC 7 81 

I I II I I II I I I I I I I I I I I I I I I I I 

Db 416 TGCTCTCTGGCGGGCAGAGGAAGCGTCTGGCCATCGCCCTGGAGCTGG 4 63 

Qy 782 TGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACAG 841 

I I I I I I III I I I I I I I I I I I I I I I I I I I I II II 

Db 464 TCAACAACCCGCCTGTCATGTTCTTTGATGAGCCCACCAGTGGTCTGGATAGCGCCTCTT 523 

Qy 842 CCCACAACCTGGTGTVAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATCT 901 

II I I I I I III I II I I III III I'll I 

Db 524 GTTTCCAAGTGGTGTCCCTCATG7\AGTCCCTGGCACAGGGGGGCCGTACCATCATCTGCA 583 

Qy 902 CCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACGT 961 

I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 584 CCATCCACCAGCCCAGTGCCAAGCTCTTTGAGATGTTTGA 623 

Qy 962 CTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATCG 1021 

II I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 624 CAAGTGCATCTTCAAAGGCGTGGTCACCAACCTGATCCCCTATCTAAAGGGACTCG 679 

Qy 1022 GCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACC 1074 

III I I I I I I I II I I I I I I I I I I II M III I I I I 

Db 68 0 GCTTGCATTGCCCCACCTACCACAACCCGGCTGACTTCATCATCGAGGTGGCC 732 



RESULT 9 

US-08-232-463-14 

; Sequence 14, Application US/08232463 

; Patent No. 5670367 

; GENERAL INFORMATION: 

; APPLICANT: DORNER, F. 

APPLICANT: SCHEIFLINGER, F. 
; APPLICANT: FALKNER, F. G. 

TITLE OF INVENTION: RECOMBINANT FOWLPOX VIRUS 

NUMBER OF SEQUENCES: 52 

CORRESPONDENCE ADDRESS; 

ADDRESSEE: Foley & Lardner . 
; STREET: 1800 Diagonal Road, Suite 500 



CITY: Alexandria 
STATE: VA 
COUNTRY: USA 
ZIP: 22313-0299 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: PatentIn Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/232,463 
FILING DATE: 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US/07/935, 313 
FILING DATE: 

APPLICATION NUMBER: EP 91 114 300.6 
FILING DATE: 26-AUG-1991 
ATTORNEY/AGENT INFORMATION: 
NAME: BENT, Stephen A. 
REGISTRATION NUMBER: 29,768 
REFERENCE/ DOCKET NUMBER: 30472/114 IMMU 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (703) 836-9300 
TELEF7\X: (703) 683-4109 
TELEX: 899149 
INFORMATION FOR SEQ ID NO: 14: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 7218 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
IMMEDIATE SOURCE: 
CLONE: pTZgpt-Fls 
US-08-232-463-14 

Query Match 2.0%; Score 54.6; DB 1; Length 7218; 

Best Local Similarity 3.8%; Pred. No. 0.001; 

Matches 15; Conservative 221; Mismatches 155; Indels 0; Gaps 0; 

Qy 1425 GCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCTTTCAA 1484 

I |....*.. . • . .. ....... ... . . ::::: :::::::: 

Db 1064 GATYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 112 3 

Qy 1485 CGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTACTATGAACT 1544 

::::::: :::::::: :::::: : :::::::: : : 

Db 1124 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1183 

Qy 1545 GGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAGCTTCC 1604 

Db 1184 YYY YYY YYY YYY YYY YYY YYY YYY YYY YYY YYY YYY YYY YYY YYY YYY YYY YYY YYY YYY 1243 

Qy 1605 GGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAACCTGAG 1664 

Db 1244 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1303 



Qy 



1665 GCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTCTGTTG 1724 



Db 



1304 Y Y Y Y Y YY Y Y Y YY Y YY YY Y YY Y Y Y YYY Y YY Y Y Y YY Y Y Y YY Y YY Y YY Y Y Y Y Y Y YY Y Y Y Y Y Y Y 1363 



Qy 1725 CAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTT 1784 

Db 1364 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1423 

Qy 1785 CAGCAATGCCCTCTACAACTCCTTCTACCTC 1815 

: : : ::: I I I I I I I I I I I I I 

Db 1424 YYYYYYYYYYYGTACCAAATTCTTCTATCTC 1454 



RESULT 10 
US-09-103-840A-2 

; Sequence 2, Application US/09103840A 

; Patent No. 6294328 

; GENERAL INFORMATION: 

; APPLICANT: FLEISCHMAN, Robert D. 

; APPLICANT: WHITE, Owen R. 

; APPLICANT: ERASER, Claire M. 

; APPLICANT: VENTER, John C. 

; TITLE OF INVENTION: DNA SEQUENCES FOR STRAIN ANALYSIS IN MYCOBACTERIUM 

; TITLE OF INVENTION: TUBERCULOSIS 

; FILE REFERENCE: 24366-20007.00 

; CURRENT APPLICATION NUMBER: US/09/ 103, 840A 

; CURRENT FILING DATE: 1998-06-24 

; NUMBER OF SEQ ID NOS : 2 

SOFTWARE: Patent In Ver. 2.1 
; SEQ ID NO 2 
; LENGTH: 4403765 
; TYPE: DNA 

; ORGANISM: Mycobacterium tuberculosis 
FEATURE : 

OTHER INFORMATION: CDC 1551 
; OTHER INFORMATION: "n" bases at various positions throughout the sequence 
; OTHER INFORMATION: represent a, t, c or g 
US-09-103-840A-2 

Query Match 2,0%; Score 54; DB 3; Length 4403765; 

Best Local Similarity 49.0%; Pred. No. 0.018; 

Matches 187; Conservative 0; Mismatches 180; Indels 15; Gaps 1; 
Qy 52 8 GGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCC7\ACTTGACTGT 587 

I I I I I I III MM M I M II MM! I M M I I 

Db 1965623 GCTGCGCAGCAGGATCGGCATGGTGCCACAGGACGACGTGGTGCACGGTCAGCTGACCGT 
1965682 

Qy 58 8 GCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCA 647 

I I I I II I I II I I II I II I I I I I II I I I 

Db 1965683 GAAACACGCGCTGATGTATGCCGCCGAACTACGGCTGCCGCCGGACACCACCAAAGATGA 
1965742 

Qy 64 8 GCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACAC 707 

II II III I I I I II MM II II I Mill 

Db 1965743 CCGCACCCAGGTAGTTGCCCGGGTGCTCGAAGAACTCGAGATGTCCT^GCACATCGACAC 
1965802 



Qy 708 CCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGC.GCAGGAGAGTCAGCAT 767 

I I I I I I I I I I I I I I M I I I I I I I I I I I I 

Db 1965803 CAGGGTCGACAA GCTGTCGGGTGGTC7y\CGCAAGCGGGCGTCGGT 

1965847 

Qy 768 TGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCT 827 

III Mil II II I I II I I I II I I I I I M I I I I I I I I 

Db 1965848 GGCGCTTGAGCTGTTGACCGGGCCGTCACTGCTGATCCTCGACGAGCCGACATCCGGCCT 

1965907 

Qy 828 CGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCG 887 

II I I I ' I I I I I I II I II I I I II I II 

Db 1965908 AGATCCTGCGCTGGACCGGCAGGTCATGACCATGCTGCGGCAGTTGGCCGACGCCGGTCG 

1965967 

Qy 88 8 GCTGGTGCTCATCTCCCTCCAC 909 

I I I II I I I I I II II 

Db 1965968 GGTGGTGCTCGTGGTTACCCAC 1965989 



RESULT 11 
US-09-103-840A-1 

; Sequence 1, Application US/09103840A 

; Patent No. 6294328 

; GENERAL INFORMATION: 

; APPLICANT: FLEISCHMAN, Robert D. 

; APPLICANT: WHITE, Owen R. 

; APPLICANT: ERASER, Claire M. 

; APPLICANT: VENTER, John C. 

; TITLE OF INVENTION: DNA SEQUENCES FOR STRAIN ANALYSIS IN MYCOBACTERIUM 
; TITLE OF INVENTION: TUBERCULOSIS 
; FILE REFERENCE: 24366-20007.00 
; CURRENT APPLICATION NUMBER: US/09/103, 84 OA 
; CURRENT FILING DATE: 1998-06-24 
; NUMBER OF SEQ ID NOS : 2 
; SOFTWARE: Patentin Ver. 2.1 
; SEQ ID NO 1 
; LENGTH: 4411529 
TYPE: DNA 

; ORGANISM: Mycobacterium tuberculosis 

OTHER INFORMATION: H37Rv 
US-09-103-840A-1 

Query Match 2.0%; Score 54; DB 3; Length 4411529; 

Best Local Similarity 49.0%; Pred. No. 0.018; 

Matches 187; Conservative 0; Mismatches 180; Indels 15; Gaps 1 

Qy 528 GGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGT 587 

I II I I I III I I I I I I I I I M I II I I I I I II II 

Db 1974794 GCTGCGCAGCAGGATCGGCATGGTGCCACAGGACGACGTGGTGCACGGTCAGCTGACCGT 

1974853 

Qy 588 GCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCA 647 

I I I I I I I I I I I I I I M I I I I I I I I I I I 

Db 1974854 GAAACACGCGCTGATGTATGCCGCCGAACTACGGCTGCCGCCGGACACCACCAAAGATGA 

1974913 



Qy 648 GCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACAC 707 

II II III I I I I II II I I II II I II I I I 
Db 1974914 CCGCACCCAGGTAGTTGCCCGGGTGCTCGAAGAACTCGAGATGTCCT^GCACATCGACAC 
1974973 

Qy 708 CCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCAT 767 

I I I I I II I I II I II I I I I I I I I I I I I I I 

Db 1974 974 CAGGGTCGACAA GCTGTCGGGTGGTCAACGCAAGCGGGCGTCGGT 

1975018 

Qy 768 TGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCT 827 

III I I I I II II I I II I I I I I I II I I I I II I I I I I I 
Db 1975019 GGCGCTTGAGCTGTTGACCGGGCCGTCACTGCTGATCCTCGACGAGCCGACATCCGGCCT 
1975078 

Qy 828 CGACAGCTTCACAGCCCACAACCTGGTG7U\GACCTTGTCCAGGCTGGCCA7yVGGCAACCG 887 

II III I I I I I I II I I I I I I I I I II 

Db 1975079 AGATCCTGCGCTGGACCGGCAGGTCATGACCATGCTGCGGCAGTTGGCCGACGCCGGTCG 
1975138 

Qy 8 88 GCTGGTGCTCATCTCCCTCCAC 909 

I I I I I I I I I I I I I I 

Db 1975139 GGTGGTGCTCGTGGTTACCCAC 1975160 



RESULT 12 

US-09-252-991A-11541 

; Sequence 11541, Application US/09252991A 
; Patent No, 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield et al. 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252 , 991A 
; CURRENT FILING DATE: 1999-02-18 

PRIOR APPLICATION NUMBER: US 60/074,788 

PRIOR FILING DATE: 1998-02-18 
; PRIOR APPLICATION NUMBER: US 60/094,190 
; PRIOR FILING DATE: 1998-07-27 
; NUMBER OF SEQ ID NOS : 33142 
; SEQ ID NO 11541 
LENGTH: 723 
TYPE: DNA 
; ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-11541 

Query Match 1.9%; Score 51.4; DB 4; Length 723; 

Best Local Similarity 46.0%; Pred. No, 0.0026; 

Matches 212; Conservative 0; Mismatches 246; Indels 3; Gaps 1 

Qy 4 68 CGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCT 527 

I I I I II I I I I I I I I II I M MINI I 

Db 14 5 CGCCGGCAAGACCACCACCATCAAGCTGGTCCTCGGCCTGCTGGCCCCCAGCGT^GGCCG 204 

Qy 528 GGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGT 587 



1 1 1 1 1 I I I I I I I I 1 1 I 1 1 I I 

Db 205 CGTGCGGGTCCTCGGCCACGATGCGAGGAGCCTGGAGGCGCGCCGCCAGCTCGGCTACCT 2 64 

Qy 588 GCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCA 647 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 265 GCCGGAGT^CGTGACCTTCTACCCGCAGCTCAGCGGCGCGGAAACCCTGCGCCACTTCGC 324 

Qy 648 GCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACAC 7 07 

II IN I I II I I I I I I I I I II I III M I 

Db 325 CCGCCTCAAGGGCGTGGCGCCGGCCGAAGCCGCGCGCCTGCTGGAACAGGTCGGCCTCGG 384 

Qy 708 CCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCAT 767 

II I III I III I Ml III I I I I II I I I I I 

Db 385 CCATGCAGCCAGGCGGCGCCTGAAAACCTACTCG7\AGGGCATGCGCCAGCGCCTCGGCCT 444 

Qy 768 TGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCT 827 

I I I I I I I I III I III I I I II I I II I I I I I I I I I 

Db 445 GGCCCAGGCGCTGCTCGGCGAACCGCGCCTGCTGCTGCTCGACGAACCGACGGTGGGCCT 504 

Qy 828 CGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTG GCCAAAGGCAA 8 84 

I I I I I I I I I I II I I I I I I I I I I M I I 

Db 505 CGACCCGCTGGCCACCGTCGAGCTCTACCAATTGCTCGACCGCCTGCGCGGCCAGGGCAC 564 

Qy 885 CCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACA 925 

I I I I Ml I I II M III 

Db 565 CGGGATCGTCCTTTGCTCCCATGTGCTGCCCGGCGTCGAGA 605 



RESULT 13 

US-09-252-991A-11845/C 

; Sequence 11845, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield et al. 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252 , 991A 

; CURRENT FILING DATE: 1999-02-18 

; PRIOR APPLICATION NUMBER: US 60/074,788 

; PRIOR FILING DATE: 1998-02-18 

; PRIOR APPLICATION NUMBER: US 60/094,190 

; PRIOR FILING DATE: 1998-07-27 

; NUMBER OF SEQ ID NOS : 33142 

; SEQ ID NO 11845 

LENGTH: 1155 

TYPE : DNA 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-11845 

Query Match 1.9%; Score 51.4; DB 4; Length 1155; 

Best Local Similarity 46.0%; Pred. No. 0.0031; 

Matches 212; Conservative 0; Mismatches 246; Indels 3; Gaps 1 

Qy 468 CGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCT 527 

II I I I I I I II II I Mil II I I I II I I 



Db 859 CGCCGGCAAGACCACCACCATCAAGCTGGTCCTCGGCCTGCTGGCCCCCAGCGAAGGCCG 800 

Qy 52 8 GGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGT 587 

I I I I I I I I I I I I I I I I I I I I 

Db 7 99 CGTGCGGGTCCTCGGCCACGATGCGAGGAGCCTGGAGGCGCGCCGCCAGCTCGGCTACCT 740 

Qy 58 8 GCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCA 647 

II I I I I I I I I I I I I I I I I I I I I I I I I III I 

Db 739 GCCGGAGAACGTGACCTTCTACCCGCAGCTCAGCGGCGCGGAAACCCTGCGCCACTTCGC 680 

Qy 648 GCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACAC 707 

II III I I I I I I I I I I II I I I I III II I 

Db 67 9 CCGCCTCAAGGGCGTGGCGCCGGCCGAAGCCGCGCGCCTGCTGGAACAGGTCGGCCTCGG 620 

Qy 708 CCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCAT 767 

II I III I III I III III I II I II I I II I 

Db 619 CCATGCAGCCAGGCGGCGCCTGAAAACCTACTCGAAGGGCATGCGCCAGCGCCTCGGCCT 560 

Qy 768 TGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCT 827 

I I II I I I I Ml I III I I I I I I I I I I II I I II I I 

Db 559 GGCCCAGGCGCTGCTCGGCGAACCGCGCCTGCTGCTGCTCGACGAACCGACGGTGGGCCT 500 

Qy 828 CGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTG GCCAAAGGCAA 884 

MM I I II I I I I I I I I I II I I I I M I 

Db 499 CGACCCGCTGGCCACCGTCGAGCTCTACCAATTGCTCGACCGCCTGCGCGGCCAGGGCAC 440 

Qy 885 CCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACA 925 

I I I I II II II I II I II I II II I 

Db 439 CGGGATCGTCCTTTGCTCCCATGTGCTGCCCGGCGTCGAGA 399 



RESULT 14 

US-09-252-991A-11600 

; Sequence 11600, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield at al • 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252 , 991A 

; CURRENT FILING DATE: 1999-02-18 

; PRIOR APPLICATION NUMBER: US 60/074,788 

; PRIOR FILING DATE: 1998-02-18 

; PRIOR APPLICATION NUMBER: US 60/094,190 

; PRIOR FILING DATE: 1998-07-27 

; NUMBER OF SEQ ID NOS : 33142 

; SEQ ID NO 11600 

; LENGTH: 2367 

; TYPE: DNA 

; ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-11600 

Query Match 1.9%; Score 51.4; DB 4; Length 2367; 

Best Local Similarity 46.0%; Pred. No, 0.0042; 

Matches' 212; Conservative 0; Mismatches 246; Indels 3; Gaps 1 



Qy 468 CGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCT 527 

I I I I I I I I I I I I I I I I I II MINI I 

Db 1569 CGCCGGCAAGACCACCACCATCAAGCTGGTCCTCGGCCTGCTGGCCCCCAGCGAAGGCCG 1628 

Qy 52 8 GGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCAC7\ACCAGCTGCTCCCCAACTTGACTGT 587 

I II II I I I I I I I I II I I I I I 

Db 162 9 CGTGCGGGTCCTCGGCCACGATGCGAGGAGCCTGGAGGCGCGCCGCCAGCTCGGCTACCT 168 8 

Qy 588 GCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGT^CCTTCTCCCAGGCCCA 647 

II I II I I I I I I I I I I I I I I I I I II I I III I 

Db 1689 GCCGGAGAACGTGACCTTCTACCCGCAGCTCAGCGGCGCGGAAACCCTGCGCCACTTCGC 1748 

Qy 648 GCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACAC 707 

II III I II I I I I II I II I I I I III II I 

Db 1749 CCGCCTCAAGGGCGTGGCGCCGGCCGAAGCCGCGCGCCTGCTGGAACAGGTCGGCCTCGG 1808 

Qy 708 CCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCAT 767 

II I III I III I III III I I I I II I I I I I 

Db 1809 CCATGCAGCCAGGCGGCGCCTGAAAACCTACTCG7\AGGGCATGCGCCAGCGCCTCGGCCT 1868 

Qy 7 68 TGGGGTGCAGCTCCTGTGGAACCCAGG7VATCCTTATTCTCGACGAACCCACCTCTGGGCT 827 

I I I I I I I I III I III I I I I I I I I I I I I I I I I I I 

Db 18 69 GGCCCAGGCGCTGCTCGGCGAACCGCGCCTGCTGCTGCTCGACGAACCGACGGTGGGCCT 1928 

Qy 828 CGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTG GCCAAAGGCAA 884 

I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 1929 CGACCCGCTGGCCACCGTCGAGCTCTACCAATTGCTCGACCGCCTGCGCGGCCAGGGCAC 1988 

Qy 885 CCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACA 925 

I I I I I I I I I I I I I I I I I II III 

Db 1989 CGGGATCGTCCTTTGCTCCCATGTGCTGCCCGGCGTCGAGA 2029 



RESULT 15 
US-08-592-874-1/C 

Sequence 1, Application US/08592874 
Patent No. 5854034 
GENERAL INFORMATION: 

APPLICANT: POLLOCK, THOMAS J. 
APPLICANT: YAMAZAKI, MOTOHIDE 
APPLICANT: THORNE, LINDA 
APPLIC7\NT: MIK0LAJCZ7VK, MARCIA 
APPLICANT: ARMENTROUT, RICHARD W. 

TITLE OF INVENTION: DNA SEGMENTS AND METHODS FOR INCREASING 
TITLE OF INVENTION: POLYSACCHARIDE PRODUCTION 
NUMBER OF SEQUENCES: 1 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: JULES E, GOLDBERG 
STREET: 261 MADISON AVENUE 
CITY: NEW YORK 
STATE: NY 
COUNTRY: USA 
ZIP: 10016-2391 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 



OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentin Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 08/592 , 874 
FILING DATE: 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/377,440 
FILING DATE: 24-JAN-1995 
ATTORNEY/AGENT INFORMATION: 
NAME: GOLDBERG, JULES E. 
REGISTRATION NUMBER: 24,408 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 212-986-4090 
TELEFAX: 212-818-9479 
INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 28804 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : unknown 
TOPOLOGY: unknown 
MOLECULE TYPE: DNA (genomic) 
FRAGMENT TYPE: N- terminal 
US-08-592-874-1 

Query Match 1.9%; Score 51.4; DB 2; Length 28804; 

Best Local Similarity 47.9%; Pred. No. 0.011; 

Matches 148; Conservative 0; Mismatches 161; Indels 0; Gaps 0; 

Qy 606 CATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGGGTGGA 665 

I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 19351 CATTGCGCTGTCCAACCCGGCGATGCCGTTCGAGCATGTCGTGGCGGCGGCGACGCTGGC 19292 

Qy 666 GGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAACATGTA 725 

I I I I I I I I I I I I I I I I I I I I I I I I 

Db 19291 GGGTGCGCATGACTTCATCCTGCGTCAGCCGCGCGGCTATGACACCGAGATCGTCGAGCG 19232 

Qy 726 CGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTCCTGTG 785 

II I I I I I I I I I I I I I I II M III III I I 

Db 19231 CGGCGTCAACCTGTCGGGCGGCCAGCGCCAGCGGCTCGCTATCGCCCGCGCGCTGGTCGG 19172 

Qy 786 GAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACAGCCCA 845 

I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I 

Db 19171 CAATCCGCGCATCCTGGTGTTCGACGAGGCGACCTCCGCGCTGGATGCCGAGAGCGAGGA 19112 

Qy 84 6 CAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATCTCCCT 905 

I I II I II I I I IN I I II I I I I I I I I I I I I 

Db 19111 GCTGATCCAGAACAATCTGCGCGCCATCTCGGCGGGCCGCACGCTGGTGATCATCGCCCA 19052 

Qy 906 CCACCAGCC 914 

II II I I 
Db 19051 CCGCCTGTC 19043 



Search completed: February 26, 2004, 09:46:10 
Job time : 144.121 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search^ using sw model 

Run on: February 21 , 2004^ 06:40:42 ; Search time 613,323 Seconds 

(without alignments) 
15698.623 Million cell updates/sec 

Title: US-09-989-981A-7 
Perfect score: 2669 

Sequence: 1 gtgtccctgctccaggaaac caattaaaaatgtattgagc 2669 



Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 



Searched : 



2353733 seqs, 1803733377 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



4707466 



Database : 



Published_Applications_NA: * 

1: /cgn2_6/ptodata/2/pubpna/US07_PUBCOMB. seq: * 

2 : /cgn2_6/ptodata/2/pubpna/PCT_NEW_PUB. seq: * 

3: /cgn2_6/ptodata/2/pubpna/US06_NEW_PUB.seq:* 

4 : /cgn2_6/ptodata/2/pubpna/US06_PUBCOMB. seq: * 

5: /cgn2_6/ptodata/2/pubpna/US07_NEW_PUB.seq: * 

6: /cgn2_6/ptodata/2/pubpna/PCTUS_PUBCOMB. seq: * 

7 : /cgn2_6/ptodata/2/pubpna/US08_NEW_PUB. seq: ^ 

8 : /cgn2_6/ptodata/2/pubpna/US08_PUBCOMB . seq: * 

9: /cgn2_6/ptodata/2/pubpna/US09A_PUBCOMB.seq: * 

10: /cgn2_6/ptodata/2/pubpna/US09B_PUBCOMB.seq: * 

11: /cgn2_6/ptodata/2/pubpna/US09C_PUBCOMB.seq:* 

12: /cgn2_6/ptodata/2/pubpna/US09_NEW__PUB.seq: * 

13: /cgn2_6/ptodata/2/pubpna/US10A_PUBCOMB.seq: * 

14: /cgn2_6/ptodata/2/pubpna/USlOB_PUBCOMB.seq: * 

15: /cgn2_6/ptodata/2/pubpna/USlOC_PUBCOMB.seq: * 

16: /cgn2_6/ptodata/2/pubpna/US10_NEW_PUB.seq: * 

17: /cgn2_6/ptodata/2/pubpna/US60_NEW_PUB.seq: * 

18: /cgn2_6/ptodata/2/pubpna/US60_PUBCOMB. seq: * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution, 
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ALIGNMENTS 



RESULT 1 

US-09-989-981A-7 

; Sequence 7, Application US/09989981A 

; Publication No. US2003004 9730A1 

; GENERAL INFORMATION: 

; APPLICANT: Hobbs, Helen H. 

; APPLICANT: Shan, Bei 



APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 0187 8 1-007 320US 
CURRENT APPLICATION NUMBER: US/09/989 , 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentin Ver. 2.1 
SEQ ID NO 7 
LENGTH: 2 669 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: (100) . . (2121) 

OTHER INFORMATION: human ABCG8 (hABCGS) 
US-09-989-981A-7 

Query Match 100.0%; Score 2669; DB 10; Length 2669; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 2669; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

GTGTCCCTGCTCCAGGAAACAGAGTGAAGACACTGGCCCTGGCAGGCAGCAGCTGGGTCT 60 

I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GTGTCCCTGCTCCAGGAAACAGAGTGAAGACACTGGCCCTGGCAGGCAGCAGCTGGGTCT 60 

AAGAGAGCTGCAGCCCAGGGTCACAGACCTGTGGGCCCCATGGCCGGGAAGGCGGCAGAG 120 

I I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 

AAGAGAGCTGCAGCCCAGGGTCACAGACCTGTGGGCCCCATGGCCGGGAAGGCGGCAGAG 120 

GAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATACCTCGGGCCTCCAGGATAGATTG 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

GAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATACCTCGGGCCTCCAGGATAGATTG 180 

TTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTACAGTGGCCAGCCCAACACCCTG 240 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I I I I 

TTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTACAGTGGCCAGCCCAACACCCTG 240 

GAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTCTCAGGTCCCTTGGTTTGAGCAG 300 

M I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I I M I I I M I I I I I I I 

GAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTCTCAGGTCCCTTGGTTTGAGCAG 300 
CTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTGCCAGAATTCTTGTGAGCTGGGC 360 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I I M I I I I M I I I I I I I I M I 

CTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTGCCAGAATTCTTGTGAGCTGGGC 360 

ATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAGCTCA 420 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I I I I I I I M I I I I 
ATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATGCTGGCCATCATAGGGAGCTCA 420 



Qy 


1 


Db 


1 


Qy 


61 


Db 


61 


Qy 


121 


Db 


121 


Qy 


181 


Db 


181 


Qy 


241 


Db 


241 


Qy 


301 


Db 


301 


Qy 


361 


Db 


361 



Qy 



421 GGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAGGTCACGGCGGCAAGATC 4 80 



421 GGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGCCGAGGTCACGGCGGCAAGATC 480 

4 81 AAGTCAGGCCAGATCTGGATCTUVTGGGCAGCCCAGCTCGCCTCAGCTGGTGAGGTyi^GTGT 54 0 

I I I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

4 81 AAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCGCCTCAGCTGGTGAGGAAGTGT 540 

541 GTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTG 600 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
541 GTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCTiACTTGACTGTGCGAGAGACCTTG 600 

601 GCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGG 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
601 GCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGG 660 

661 GTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAAC 720 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
661 GTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAAC 720 

721 ATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTC 7 80 

I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
721 ATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTC 78 0 

7 81 CTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACA 84 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
7 81 CTGTGG7VACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACA 840 

841 GCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATC 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

841 GCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATC 900 

901 TCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACG 960 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 

901 TCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACG 960 

961 TCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATC 1020 

1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n I I I I I I I I I I I I I I I I I I 

961 TCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATC 1020 

1021 GGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATT 1080 

I I I I I I I I I I I I I M I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 
1021 GGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATT 1080 

1081 GACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCC 1140 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
1081 GACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCC 1140 

1141 CTGTTTCTAGAAA7VAGTGCGTGACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGGAT 1200 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
1141 CTGTTTCTAGAAAAAGTGCGTGACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGGAT 1200 

1201 CTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCCACTAGACACCAACTGCCTCCCG 1260 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 
1201 CTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCCACTAGACACCAACTGCCTCCCG 1260 

12 61 AGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATT 1320 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1261 AGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATT 1320 

1321 TCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATG 1380 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I 

1321 TCC7\ACGACTTCCGAGACCTGCCCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATG 1380 

1381 TCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGAT 144 0 

I M I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1381 TCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGAT 1440 

1441 ACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATGTC 1500 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1441 ACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCTTTCAACGTCATTCTGGATGTC 1500 

1501 ATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTACTATGAACTGGTVAGACGGGCTGTAC 1560 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1501 ATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTAC 1560 

1561 ACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTAC 1620 

M I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I M I I I I I I I I I I I M M I I I I M 

1561 ACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTAC 1620 

1621 ATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCC 1680 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I I I M I I I I I I 

1621 ATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCC 168 0 

1681 TTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTG 1740 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
1681 TTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTG 1740 

1741 GCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTAC 1800 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
1741 GCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTAC 1800 

1801 AACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCC 1860 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1801 AACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCC 1860 

1861 GCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAG 1920 

M I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I M I 
1861 GCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAG 1920 

1921 TTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGAT 198 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I I I I 
1921 TTCAGCAGAAGAACTTATAAAATGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGAT 1980 

1981 AAAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTC 204 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1981 AATVATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTC 2040 

2041 ATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAG 2100 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

2041 ATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAG 2100 

2101 AAACCAAGTCAAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGC 2160 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2101 AAACCAAGTCAAGACTGGTGATTCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGC 2160 



Qy 2161 AGACCCTTCAACTGCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGA 2220 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2161 AGACCCTTCAACTGCACTCCCTCCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGA 2220 

Qy 2221 CCCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAG 22 8 0 

I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
Db 2221 CCCTACAGATGCTCAGCTACATCCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAG 22 8 0 

Qy 2281 GATGGCAGTAGAATAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGAT 2340 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I I M I I 
Db 2281 GATGGCAGTAGAATAAAGACAGTCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGAT 2340 

Qy 2341 GACTGGGAGAAAACCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGA 2400 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
Db 2341 GACTGGGAGAAAACCTGCACTCGGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGA 24 00 

Qy 24 01 TATGCATTTATATAGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAG 2460 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I I 
Db 2401 TATGCATTTATATAGGCAACTCGATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAG 2460 

Qy 2461 CTAGACTGTGCAGG7\ATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTG 252 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 
Db 2 461 CTAGACTGTGCAGGAATTGTTGGAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTG 2520 

Qy 2521 GCTTCATCTTCCAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACC 2580 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2521 GCTTCATCTTCCAGGGGCCCCACACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACC 258 0 

Qy 2581 TAAGATGTACCAGCAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGC 2640 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2581 TAAGATGTACCAGCAAGATGCCATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGC 2640 

Qy 2 641 CAACGTGAACAATTAAAAATGTATTGAGC 2669 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2641 CAACGTGAACAATTAAAAATGTATTGAGC 2669 



RESULT 2 
US-10-415-378 
Sequence 29 



•29 



Application US/10415378 
Publication No. US20040014945A1 
GENERAL INFORMATION: 
APPLICANT: INCYTE CORPORATION; TANG, Y. Tom 
APPLICANT: YUE, Henry; NGUYEN, Danniel B.; 
APPLICANT: HAFALIA, April J.A. ; ELLIOTT, Vicki S.; 
APPLICANT: LU, Yan; CHAWLA, Narinder K. ; 
APPLICANT: YAO, Monique G.; BAUGHN, Mariah R. ; 
APPLICANT: GANDHI, Ameena R. ; DING, Li; 

APPLICANT: SANJANWALA, Madhusudan M. ; R7\MKUMAR, Jayalaxmi; 
APPLICANT: ARVIZU, Chandra S.; GIETZEN, Kimberly J. ; 
APPLICANT: LAL, Preeti G. ; AZIMZAI, Yalda; 
APPLICANT: KH7\N, Farrah A. ; THANGAVELU, Kavitha; 
APPLIC7\NT: THORNTON, Michael B, ; LU, Dyung Aina M. ; 
APPLICANT: TRIBOULEY, Catherine M. ; WARREN, Bridget A.; 
APPLICANT: ISON, H. Craig; DAS, Debopriya; 
APPLICANT: RAUMANN, Briqette E.; POLICKY, Jennifer L. ; 



AP P L I C7\NT : KEARNEY , Li am 

TITLE OF INVENTION: TRANSPORTERS AND ION CHANNELS 
FILE REFERENCE: PI-0270 USN 

CURRENT APPLICATION NUMBER: US/10/415,378 
CURRENT FILING DATE: 2003-05-07 
PRIOR APPLICATION NUMBER; PCT/USOl/46055 
PRIOR FILING DATE: 2001-10-27 
PRIOR APPLICATION NUMBER: US 60/250,790 
PRIOR FILING DATE: 2000-12-01 
PRIOR APPLICATION NUMBER: US 60/252,232 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/249,661 
PRIOR FILING DATE: 2000-11-17 
PRIOR APPLICATION NUMBER: US 60/247,673 
PRIOR FILING DATE: 2000-11-09 
PRIOR APPLICATION NUMBER: US 60/245,904 
PRIOR FILING DATE: 2000-11-03 
PRIOR APPLICATION NUMBER: US 60/243,989 
PRIOR FILING DATE: 2000-10-27 
NUMBER OF SEQ ID NOS : 40 
SOFTWARE: PERL Program 
SEQ ID NO 2 9 
LENGTH: 3239 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

NAME/KEY: misc_feature 

OTHER INFORMATION: Incyte ID No. US2 004 0014 945A1 6585710CB1 
US-10-415-378-29 

Query Match 63.0%; Score 1680.6; DB 15; Length 3239; 

Best Local Similarity 99.8%; Pred. No. 0; 

Matches 1683; Conservative 0; . Mismatches 4; Indels 0; Gaps 0; 

Qy 983 GGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACA 1042 

III I I I I I I I I I II I I I I I I I I II I I I I I I I I I II I I I II M I I I I I I I I I II I I I 
Db 12 GGGGCGGCCAGCACATGGTCCATTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACA 71 

Qy 1043 GCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGG 1102 

I I I I I I M I I I I I I I II I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I M I I I I I I 
Db 72 GCAATCCTGCTGACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGG 131 

Qy 1103 AATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTG 1162 

I I I I I I I I I I I I I I I I M I I I I I I I I I I II I I I I I I I I I I I I I I I I I I M I II I I I I I I I 

Db 132 AATTGGCCACCAGGGAGAAGGCTCAGTCACTCGCAGCCCTGTTTCTAGAAAAAGTGCGTG 191 

Qy 1163 ACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGG 1222 

I I I I I I I I I I I II I I I I M I I M I I I I I I I I I I I I I I I I II I I I II I I I I I I I I II M.I I 
Db 192 ACTTAGATGACTTTCTATGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGG 251 

Qy 1223 AAAGCAGCGTGACCCCACTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGG 12 82 

I I I I I I I I I I I I I I I I I I II M M I I M I I I I I II I I I I I I I I I II I I I I I I I I I II I I I 
Db 252 AT^GCAGCGTGACCCCACTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGG 311 

Qy 1283 CGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGC 1342 

M I I I I I I I I I I I I I I I I I I I I I I I I I II M I I M I I M I I I M I I I I I I I I I II I I I I I 

Db 312 CGGTGCAGCAGTTTACGACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGC 371 



Qy 


1343 


CCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCT 


1402 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t ) 1 1 1 1 1 1 1 > 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 

1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M M 1 i M M 




Db 


372 


CCACCCTCCTCATCCATGGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCT 


431 


Qy 


1403 


ATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGA 


1462 




1 1 1 I 1 1 1 1 ■ 1 1 1 I I 1 1 1 t t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M M 1 M 




Db 


432 


ATTTTGGCCATGGGAGCATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGA 


491 


Qy 


1463 


TCGGTGCTCTCATCCCTTTC7\ACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGA 


1522 




1 1 1 1 I 1 1 1 1 1 ■ 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 




Db 


492 


TCGGTGCTCTCATCCCTTTCT^CGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGA 


551 


Qy 


1523 


GGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTG 


1582 




llllllllllllllllllltllJIIIItlltlllllll 

1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 




Db 


552 


GGGCAATGCTTTACTATGAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTG 


611 


Qy 


1583 


CCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCA 


1642 


Db 


612 


1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 M 1 M 1 1 

CCAAGATCCTCGGGGAGCTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCA 


671 


Qy 


1643 


CCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGG 


1702 




1 1 1 M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 




Db 


672 


CCTACTGGCTGGCCAACCTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGG 


731 


Qy 


1703 


TGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCA 


1762 




t, t t, t m 1 t Illlllillllllllllllllllllllllllllllllllllll 

1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 




Db 


732 


TGTGGCTGGTGGTCTTCTGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCA 


791 


Qy 


1763 


CCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGG 


1822 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 




Db 


792 


CCTTCCACATGGCCTCCTTCTTCAGCAATGCCCTCTACAACTCCTTCTACCTCGCCGGGG 


851 


Qy 


1823 


GCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCT 


1882 




1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 




Db 


852 


GCTTCATGATAAACTTGAGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCT 


911 


Qy 


1883 


TCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAA 


1942 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


912 


TCCTGCGGTGGTGTTTTGAAGGGCTGATGAAGATTCAGTTCAGCAGAAGAACTTATAAAA 


971 


Qy 


1943 


TGCCTCTCGGGAACCTCACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGC 


2002 




, , , , 1 1 1 ■ 1 1 1 I 1 a 1 1 1 1 1 1 1 1 ■ 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 




Db 


972 


TGCCTCTCGGG7VACCTCACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGC 


1031 


Qy 


2003 


TGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCA 


2062 




1 1 1 1 ■ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ■ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 




Db 


1032 


TGGACTCGTACCCTCTCTACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCA 


1091 


Qy 


2063 


TGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGAAACCAAGTCAAGACTGGTGAT 


2122 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1092 


TGGTCCTGTACTACGTGTCCTTAAGGTTCATCAAACAGAAACCAAGTCAAGACTGGTGAT 


1151 


Qy 


2123 


TCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACTGCACTCCCT 


2182 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1152 


TCACGCCAGACGTCTGCCCGCTGGTGGGGGACCTGAGCAGACCCTTCAACTGCACTCCCT 


1211 



Qy 2183 CCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCTCAGCTACAT 2242 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 

Db 1212 CCTCAGGAGCCCCTTCCTGGGGACAGTGAGGACAATGACCCTACAGATGCTCAGCTACAT 1271 

Qy 2243 CCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAATAAAGACAG 2302 

I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1272 CCGGCCCAGGGTGCTGCAGTGGCACAGACCAGCCACAGGATGGCAGTAGAATAAAGACAG 1331 

Qy 2303 TCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAGAAAACCTGCACTC 2362 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I I I 
Db 1332 TCGAAAGGGATTTCTGCTCACTGGCAGGAGACTGCGATGACTGGGAGAAAACCTGCACTC 1391 

Qy 2363 GGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTATATAGGCAACTC 2422 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1392 GGTGGCACCTACAACGTTGCTAATTTATTTCCTTTTGATATGCATTTATATAGGCAACTC 1451 

Qy 2423 GATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGCAGGAATTGTTG 24 82 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M I 
Db 1452 GATATAGGATGGGAGCAAACTAGGAATGAATTGGGTAGCTAGACTGTGCAGGAATTGTTG 1511 

Qy 24 83 GAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCAGGGGCCCCA 2542 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
Db 1512 GAACCTGGAGGGAACAATAACAGTAGCTAGCAGATTTGGCTTCATCTTCCAGGGGCCCCA 1571 

Qy 2543 CACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTACCAGCAAGATGCC 2602 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
Db 1572 CACTCCGTGGTGAGCCACCATCAATACAGAAAGTGACCTAAGATGTACCAGCAAGATGCC 1631 

Qy 2603 ATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCA7\AAGCCAACGTGAACAATTAAAAATGT 2 662 

I I I M I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1632 ATCCCTTCTTTTTGTGTGGGGTCATGGGCTCCAAAAGCCAACGTGAACAATTAAAAATGT 1691 

Qy 2663 ATTGAGC 2669 

I I I I I I I 

Db 1692 ATTGAGC 1698 



RESULT 3 

US-09-989-981A-3 

Sequence 3, Application US/09989981A 
Publication No. US20030049730A1 
GENERTUi INFORMATION: 
APPLICANT: Hobbs, Helen H, 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 01878 1-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 



SOFTWARE: Patent In Ver. 2.1 
SEQ ID NO 3 
LENGTH: 2019 
TYPE: DNA 

ORGANISM: Mus mus cuius 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: (1) . . (2019) 

OTHER INFORMATION: mouse ABCG8 (mABCG8) 
US-09-989-981A-3 

Query Match 53.6%; Score 1430; DB 10; Length 2019; 

Best Local Similarity 82.0%; Pred. No. 0; 

Matches 1659; Conservative 0; Mismatches 360; Indels 3; Gaps 1; 

Qy 100 ATGGCCGGGAAGGCGGCAGAGGAGAGAGGGCTGCCGAAAGGGGCCACTCCCCAGGATACC 159 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 ATGGCTGAGAAAACCAAAGAAGAGACCCAGCTGTGGAATGGGACTGTACTTCAGGATGCT 60 

Qy 160 TCGGGCCTCCAGGATAGATTGTTCTCCTCTGAAAGTGACAACAGCCTGTACTTCACCTAC 219 

I I I I I I I I I I I I I I II I I I I I I I I I II II I II II I II M II I I I I I I II I I I I I I I 

Db 61 TCGGGCCTCCAGGACAGCTTGTTCTCCTCGGAAAGTGACAACAGTCTGTACTTCACCTAC 12 0 

Qy 220 AGTGGCCAGCCCAACACCCTGGAGGTCAGAGACCTCAACTACCAGGTGGACCTGGCCTCT 279 

I I I I I III I I II I I I I I I I I II I I I II I I I I I I I I I I II II I I I I I I I I I I I I 

Db 121 AGTGGTCAGTCCAACACTCTGGAGGTCAGAGATCTCACCTACCAGGTGGACATCGCCTCT 18 0 

Qy 280 CAGGTCCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATGCCCTGGACATCTCCCAGCTGC 339 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I III II 

Db 181 CAGGTGCCTTGGTTTGAGCAGCTGGCTCAGTTCAAGATACCCTGGAGGTCTCATAGCAGC 24 0 

Qy 34 0 CAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGCAGATG 399 

II I II II I I I I I I I I I II I I I II I I I I I I I I I I I I I I I M I I I I I I I I I I I 

Db 241 CAAGACTCCTGTGAGCTGGGCATCCGAAATCTAAGCTTCAAAGTGAGGAGTGGACAGATG 300 

Qy 400 CTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCACTGGC 459 

I I I I I I I I I I I I I I I I II II I I I II I I I I I I I I I I I I II II I II I I I I I III 

Db 301 CTGGCCATCATAGGGAGCTCAGGCTGCGGGAGAGCCTCACTACTCGACGTGATCACAGGC 360 

Qy 460 CGAGGTCACGGCGGCT^AGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCAGCTCG 519 

I I I I I I I I I I II I I I I I II I I I I I II II I I II I I I I I I I I I I I I I I II 

Db 361 AGAGGCCACGGTGGCAAGATGAAATCAGGACAAATTTGGATAAATGGGCAACCCAGTACG 420 

Qy 52 0 CCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCAC7\ACCAGCTGCTCCCCAAC 57 9 

I I I I I M I I I I I I I I I I I I I II II II II I II II I I I I I I I II II I I I I I I I 

Db 421 CCTCAGCTGGTGAGGAAGTGCGTTGCGCATGTGCGGCAGCATGACCAACTGCTGCCCAAC 480 

Qy 580 TTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCC 639 

I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I II I I 

Db 481 CTGACCGTCAGAGAGACCCTGGCTTTCATTGCCCAGATGCGCCTGCCCAGGACCTTCTCC 540 

Qy 640 CAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGC 699 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I M I I I I 

Db 541 CAGGCCCAGCGTGACAAACGGGTGGAAGACGTAATCGCCGAGCTGCGGCTGCGGCAGTGC 600 



Qy 



700 GCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGA 759 
II I I I I I I I I I I I I I I I I I I I I I M I I I I II I I I I II I I I I I I I I II 



Db 601 GCCAACACCAGAGTGGGCAACACGTATGTACGTGGGGTGTCCGGGGGTGAGCGCCGACGA 660 

Qy 760 GTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACC 819 

II I I I II I I I I I I I II I II I I I II I I I I I I I I M II II I I II I I I II I I I II I I I 
Db 661 GTGAGCATTGGGGTGCAGCTCCTGTGGTUVCCCAGGAATCCTCATTCTGGATGAACCCACT 720 

Qy 82 0 TCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAA 879 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II 
Db 721 TCTGGCCTCGACAGCTTCACAGCCCACTWVTCTGGTGACAACCTTGTCCCGCCTGGCCAAG 780 

Qy 880 GGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTT 939 

II I I I I I I I I I I I I I I I I I I I I M I M I I I I I I I I I I I I I I I M I I I I I I I I I I I III 
Db 781 GGCAACAGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTATTT 840 

Qy 94 0 GATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATG 999 

II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I III 
Db 841 GACCTGGTCCTTCTGATGACATCTGGCACCCCTATCTACCTGGGGGCGGCGCAGCAAATG 900 

Qy 1000 GTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTC 1059 

II I I I I I MINI I I I I III I I II II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 901 GTGCAGTACTTCACATCCATTGGCCACCCTTGTCCTCGCTATAGCAACCCTGCGGACTTC 960 

Qy 1060 TATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCAGGGAG 1119 

II I I I I I I I I I I I I I I I I I I I M I I I I I I I III I III I I I I M I I I I I I 
Db 961 TACGTGGACTTGACCAGCATCGACAGACGCAGCAAAGAACGGGAGGTGGCCACCGTGGAG 1020 

Qy 1120 AAGGCTCAGTCACTCGCAGCCCTGTTTCTAG7VAAAAGTGCGTGACTTAGATGACTTTCTA 117 9 

I I I I I I I I I I II I I I I M I I I I I I I I I I I I I I I I I I III I I I I I I I M I I 

Db 1021 AAGGCACAGTCTCTTGCAGCCCTGTTCCTAGAAAAAGTACAAGGCTTTGATGACTTTCTG 1080 

Qy 1180 TGGAAAGCAGAGACGAAGGATCTTGACGAGGACACCTGTGTGGAAAGCAGCGTGACCCCA 1239 

II I I M I I I I I I I I I M I I II I II I I IN III 

Db 1081 TGGAAAGCTGAGGCAAAGGAACTCAACACAAGCACCCACACAGTCAGCCTGACCCTCACA 114 0 

Qy 124 0 CTAGACACCAACTGCCTCCCGAGTCCTACGAAGATGCCTGGGGCGGTGCAGCAGTTTACG 1299 

I Mill I I I I Mill I I I II I I II II I I II I M I I 

Db 1141 CAGGACACTGACTG TGGGACTGCTGTTGAGCTGCCCGGGATGATAGAGCAGTTTTCC 1197 

Qy 1300 ACGCTGATCCGTCGTCAGATTTCCAACGACTTCCGAGACCTGCCCACCCTCCTCATCCAT 1359 

M II I I II I I I I I I I I II I I I I I I I I II II II I I II I I I I I M I II I I I II III 
Db 1198 ACCCTGATCCGTCGTCAGATTTCCAATGACTTCCGGGACCTGCCCACGCTGCTCATTCAT 1257 

Qy 1360 GGGGCGGAGGCCTGTCTGATGTCAATGACCATCGGCTTCCTCTATTTTGGCCATGGGAGC 1419 

I I I I I I I I I I I I I I II I II I I I I II II II II I I I I I II II I I II I I 
Db 1258 GGGTCGGAAGCCTGCCTGATGTCCCTCATCATTGGCTTCCTTTACTACGGCCATGGGGCC 1317 

Qy 142 0 ATCCAGCTCTCCTTCATGGATACAGCCGCCCTCTTGTTCATGATCGGTGCTCTCATCCCT 1479 

I II I I I I I II I I II I M I I II M' I II I II I I II I I I I I II II II II I III 
Db 1318 AAGCAGCTCTCCTTCATGGACACAGCAGCCCTCCTCTTCATGATAGGGGCGCTCATTCCT 1377 

Qy 14 80 TTCAACGTCATTCTGGATGTCATCTCCAAATGTTACTCAGAGAGGGCAATGCTTTACTAT 1539 

I I I II I I II I I I II I I I I I M II I I I I I M I II I I II I II I I II II I II I I I I 

Db 1378 TTCAATGTCATCCTGGATGTCGTCTCCTW^TGTCACTCGGAGAGGTCAATGCTGTACTAT 1437 

Qy 1540 GAACTGGAAGACGGGCTGTACACCACTGGTCCATATTTCTTTGCCAAGATCCTCGGGGAG 1599 

II II I I I I I II I I II I I I I I I I I I I I I I I II I I I I II I I I I II I I M I I II II 

Db 1438 GAGCTGGAAGACGGGCTGTACACTGCTGGTCCTTATTTCTTTGCCAAGATCCTAGGAGAA 1497 



Qy 1600 CTTCCGGAGCACTGTGCCTACATCATCATCTACGGGATGCCCACCTACTGGCTGGCCAAC 1659 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 
Db 14 98 TTGCCGGAGCACTGTGCCTACGTCATCATCTACGCGATGCCCATCTACTGGCTGACAAAC 1557 

Qy 1660 CTGAGGCCAGGCCTCCAGCCCTTCCTGCTGCACTTCCTGCTGGTGTGGCTGGTGGTCTTC 1719 

I I I II I I I I II I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I 

Db 1558 CTGCGGCCCGTGCCTGAGCTCTTCCTTCTACACTTCCTGCTCGTGTGGTTGGTGGTCTTC 1617 

Qy 1720 TGTTGCAGGATTATGGCCCTGGCCGCCGCGGCCCTGCTCCCCACCTTCCACATGGCCTCC 1779 

II I I I I I I I I I I I II I I I I I III I III I I I I II I I I I I I I I I I I I I I I I I I 

Db 1618 TGCTGCAGGACCATGGCCCTGGCTGCCTCTGCCATGCTGCCCACCTTCCACATGTCCTCC 1677 

Qy 1780 TTCTTCAGC7\ATGCCCTCTACAACTCCTTCTACCTCGCCGGGGGCTTCATGATAAACTTG 1839 

MINI I I I I I I I I I I I I II I I I I I I M I I I I I I I 1 I I I I I I I I M I M I I I I I 

Db 1678 TTCTTCTGCAATGCCCTCTACAACTCCTTCTACCTTACTGCCGGCTTCATGATAAACTTG 1737 

Qy 1840 AGCAGCCTGTGGACAGTGCCCGCGTGGATTTCCAAAGTGTCCTTCCTGCGGTGGTGTTTT 1899 

II II I I I I I I I I II I I II I I II I I I I M I I II I I I II I I I I I I I I II 
Db 1738 GACAACCTGTGGATAGTGCCTGCATGGATCTCC7\AGCTGTCGTTCCTCCGGTGGTGCTTC 1797 

Qy 1900 GAAGGGCTGATG7\AGATTCAGTTCAGCAGAAGAACTTATA7\AATGCCTCTCGGGAACCTC 1959 

I I I I I I I I I I I I I I I I II I II I I I I I I I M I I I I I I 

Db 1798 TCGGGGCTGATGCAGATTCAATTTAATGGACACCTTTACACCACACAAATCGGCAACTTC 1857 

Qy 1960 ACCATCGCGGTCTCAGGAGATAAAATCCTCAGTGCCATGGAGCTGGACTCGTACCCTCTC 2 019 

I I I II I II I I I I I I II I I I I I I I I I I I I I I I I M I I I I II I I I 

Db 1858 ACCTTCTCCATCCTCGGAGACACGATGATCAGTGCCATGGACCTGAACTCGCATCCACTC 1917 

Qy 2 02 0 TACGCCATCTACCTCATCGTCATTGGCCTCAGCGGTGGCTTCATGGTCCTGTACTACGTG 207 9 

I I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 
Db 1918 TATGCGATCTACCTCATTGTCATCGGCATCAGCTACGGCTTCCTGTTCCTGTACTATCTA 1977 

Qy 2080 TCCTTAAGGTTCATCAAACAGAAACCAAGTCAAGACTGGTGA 2121 

I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I 
Db 1978 TCCTTGAAGCTCATCAAACAGAAGTCAATTCAAGACTGGTGA 2019 



RESULT 4 
US-09-837-992-4 

; Sequence 4, Application US/09837992 

; Patent No. US2002008 1687A1 

; GENERAL INFORMATION: 

; APPLICT^NT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 

; TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 

; TITLE OF INVENTION: and Methods of Use 

; FILE REFERENCE: 0187 81-006020US 

; CURRENT APPLICATION NUMBER: US/ 09/ 837 , 992 

; CURRENT FILING DATE: 2001-04-18 

; PRIOR APPLICATION NUMBER: US 60/198,465 

; PRIOR FILING DATE: 2000-04-18 

; PRIOR APPLICATION NUMBER: US 60/204,234 

; PRIOR FILING DATE: 2000-05-15 

; NUMBER OF SEQ ID NOS : 45 



SOFTWARE: Patentin Ver. 2.1 
SEQ ID NO 4 
LENGTH: 2340 
TYPE : DNA 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: human sitosterolemia gene (SSG) 
NAME/KEY: CDS 
LOCATION: (107) . . (2062) 

OTHER INFORMATION: human sitosterolemia susceptibility gene (SSG) 
OTHER INFORMATION: protein 
US-09-837-992-4 



Query Match 7.6%; 
Best Local Similarity 54.4%; 
Matches 432; Conservative 



Score 203.6; DB 9; 
Pred. No. 2.6e-51; 
0; Mismatches 359; 



Length 2340; 



Indels 



3; Gaps 



1; 



Qy 

Db 

Qy 

Db 



335 GCTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGC 394 

11111111 III I I I I I I III I II I MINI 

285 GCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGC 344 

395 AGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCA 454 

I II I 11^ MM MM I I II II M I MM I II I MM II I II 

345 AGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGT 404 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



455 CTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCA 514 

I I I I III III I I I I I I I I I I I I I I 

405 CCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGC 464 

515 GCTCGCCTCAGCTGGTGAGGTyVGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCC 574 

I I I I I I I I I I I I I II II I III III Mill 
4 65 TGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGA 524 

575 CCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCT 634 

II I I I I I I I I I I II I I II II II II I I II I I I I I 

525 GCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCA 584 

635 TCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGC 694 

II II I I II I I I I I II M I I I II I I II II I I I I I II I 

585 ATCCCGGCTCCTTCC AGAAGT^GGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCC 641 

695 AGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCA 754 

I I I I II I I II II I I I I I I I II III II I M II II 

642 ATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGC 701 

755 GGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAAC 814 

I I I I I II I I I I I II I I I I I II III I I I I II I 

7 02 GCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGC 7 61 

815 CCACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGG 874 

I I I I I I I II I M II I I I I I I I III III II II 

7 62 CAACCACAGGCCTGGACTGCATGACTGCT7\ATCAGATTGTCGTCCTCCTGGTGGAACTGG 821 



Qy 



Db 



875 CCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGC 934 

I I I M M I I II I I I I I I I I II I I M I I I M II III II 

822 CTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGC 881 



Qy 935 TGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGC 994 

I I I I I I I II III I II I II I II II III I 

Db 8 82 TCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGG 941 

Qy 995 ACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTG 1054 

I I II I II MM II I I I I II I I I I I I I I I I I I II II 

Db 942 AAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTG 1001 

Qy 1055 ACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCA 1114 

I I I I II I I I I I II I I II I I I I I I MM II I I I I I I I III 

Db 1002 ACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCT 1061 

Qy 1115 GGGAGAAGGCTCAG 1128 

III I Ml 
Db 1062 CCAAGAGAGTCCAG 1075 



RESULT 5 

US-09-989-981A-5 

Sequence 5, Application US/09989981A 
Publication No. US20030049730A1 
GENER7VL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patent In Ver. 2.1 
SEQ ID NO 5 
LENGTH: 2340 
TYPE: DNA 

0RG7\NISM: Homo sapiens 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: (107) . . (2062) 

OTHER INFORMATION: human ABCG5 (hABCG5) 
US-09-989-981A-5 

Query Match 7.6%; Score 203.6; DB 10; Length 2340; 

Best Local Similarity 54.4%; Pred. No. 2.6e-51; 

Matches 432; Conservative 0; Mismatches 359; Indels 3; Gaps 1; 

Qy 335 GCTGCCAGT^ATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGC 394 

I I I II I I I III MM I I III MM II I I I I 

Db 285 GCCGGCAGCAGTGGACCAGGCAGATCCTCAAAGATGTCTCCTTGTACGTGGAGAGCGGGC 344 



Qy 395 AGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCA 454 



I I I I M I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I II 

Db 345 AGATCATGTGCATCCTAGGAAGCTCAGGCTCCGGGAAAACCACGCTGCTGGACGCCATGT 404 

Qy 455 CTGGCCGAGGTCACGGCGGCTVAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCA 514 

Mil III Ml I I II I I I I I I I I I I 

Db 405 CCGGGAGGCTGGGGCGCGCGGGGACCTTCCTGGGGGAGGTGTATGTGAACGGCCGGGCGC 464 

Qy 515 GCTCGCCTCAGCTGGTGAGG7\AGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCC 574 

I I II I I I I I I I I I I I II I III I I I I I II I 
Db 465 TGCGCCGGGAGCAGTTCCAGGACTGCTTCTCCTACGTCCTGCAGAGCGACACCCTGCTGA 524 

Qy 575 CCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCT 634 

I I I I II I I I I I M I I I II I I I I II I I I I I I I I I 

Db 525 GCAGCCTCACCGTGCGCGAGACGCTGCACTACACCGCGCTGCTGGCCATCCGCCGCGGCA 584 

Qy 635 TCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGC 694 

II II I I I I I I M I I M I I II I I I I I I II I I I I I I I I 

Db 585 ATCCCGGCTCCTTCC AGAAGAAGGTGGAGGCCGTCATGGCAGAGCTGAGTCTGAGCC 641 

Qy 695 AGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCA 754 

I I I I I I I I I I I I I I I I I I I M III I I I I I I I I I 

Db 642 ATGTGGCAGACCGACTGATTGGCAACTACAGCTTGGGGGGCATTTCCACGGGTGAGCGGC 701 

Qy 755 GGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAAC 814 

I I I I I I I I I I I I M I I I I I I I III I I I I II I 

Db 702 GCCGGGTCTCCATCGCAGCCCAGCTGCTCCAGGATCCTAAGGTCATGCTGTTTGATGAGC 761 

Qy 815 CCACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGG 874 

I I I I I I I I I I I I II I I I I I I I III III 'I I I I 

Db 762 CAACCACAGGCCTGGACTGCATGACTGCTAATCAGATTGTCGTCCTCCTGGTGGAACTGG 821 

Qy 875 CCAT^AGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGC 934 

I I I I II I Mill I M II I I I I I I I II II I I I I III II 

Db 822 CTCGCAGGAACCGAATTGTGGTTCTCACCATTCACCAGCCCCGTTCTGAGCTTTTTCAGC 881 

Qy 935 TGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGC 994 

I I I I I I I I I I I I I I I I II II I I I I M I 

Db 882 TCTTTGACAAAATTGCCATCCTGAGCTTCGGAGAGCTGATTTTCTGTGGCACGCCAGCGG 941 

Qy 995 ACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTG 1054 

II I II II I I I I II I I I I I I I II I I I I I I I II I I II 

Db 942 AAATGCTTGATTTCTTCAATGACTGCGGTTACCCTTGTCCTGAACATTCAAACCCTTTTG 1001 

Qy 1055 ACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCA 1114 

I I II I I I I I I I I I I I I I I I I I I I I I I I I M I I I II I III 

Db 1002 ACTTCTATATGGACCTGACGTCAGTGGATACCCAAAGCAAGGAACGGGAAATAGAAACCT 1061 

Qy 1115 GGGAGAAGGCTCAG 1128 

III I III 

Db 1062 CCAAGAGAGTCCAG 1075 



RESULT 6 

US-09-989-981A-1 

; Sequence 1, Application US/09989981A 
; Publication No. US20030049730A1 
; GENERAL INFORMATION: 



APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 0187 8 1-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patent In Ver. 2.1 
SEQ ID NO 1 
LENGTH: 1959 
TYPE: DNA 

ORGANISM: Mus mus cuius 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: (1) . . (1959) 

OTHER INFORMATION: mouse ABCG5 (mABCG5) 
US-09-989-981A-1 

Query Match 7.2%; Score 193.4; DB 10; Length 1959; 

Best Local Similarity 53.4%; Pred. No. 3.3e-48; 

Matches 429; Conservative 0; Mismatches 371; Indels 3; Gaps 1; 

Qy 335 GCTGCCAGT^ATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGC 394 

II I I I I I I II I I I I I I I II I I I I I I I I 

Db 182 GCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGCC 2 41 

Qy 395 AGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCA 4 54 

I I I I M II I I M I I I II I M I I . I MM III I II I I I I III 
Db 242 AGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATCT 301 

Qy 455 CTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCT^ATGGGCAGCCCA 514 

III I I II III I III I I I I I I II II 

Db 302 CCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGC 361 

Qy 515 GCTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCC 574 

I I I I III I I I I I M I III III III 

Db 362 TGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTGA 421 

Qy 575 CCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCT 634 

I I I I I I I I I I I I I II I I I I I III II I I I I I I III I 

Db 422 GCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGC GATGCTGGCCCTCTGCCGCA 478 

Qy 635 TCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGC 694 

I I I I III I I I I I I II M I I I I I I I I I I I I I I I I I I I 

Db 479 GCTCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCC 538 

Qy 695 AGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCA 7 54 

I Mill I I I I I I I I I I I I III I I I I I II I 

Db 539 ACGTGGCGGACCAAATGATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGC 598 



Qy 755 GGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAAC 814 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 599 GCCGAGTTTCCATCGCAGCCCAACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGC 658 

Qy 815 CCACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGG 874 

I I I I I I I I I I I I I I I I I I I I I III I I I I I II M I 

Db 659 CAACCACAGGACTGGACTGCATGACTGCAAATCTWVTTGTCCTTCTCTTGGCTGAGCTGG 718 

Qy 875 CCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGC 934 

I I II II I I I I I I I II I I I II I II II I I II I I II I II I I I I 

Db 719 CTCGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAAC 778 

Qy 935 TGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGC 994 

I I I I I I I I I M I I I I I I I I I I I II 

Db 779 ACTTCGACAAAATTGCCATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGG 838 

Qy 995 ACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTG 1054 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 839 AGATGCTTGGCTTCTTCAATAACTGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTTG 898 

Qy 1055 ACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGAATTGGCCACCA 1114 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 899 ATTTTTACATGGACTTGACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGT 958 

Qy 1115 GGGAGAAGGCTCAGTCACTCGCA 1137 

II I III I I I I 
Db 959 ACAAGCGAGTACAGATGCTGGAA 981 



RESULT 7 
US-09-837-992-2 

; Sequence 2, Application US/09837992 

; Patent No. US20020081687A1 

; GENER7VL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 

; TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 

; TITLE OF INVENTION: and Methods of Use 

; FILE REFERENCE: 01878 1-00602 OUS 

; CURRENT APPLICATION NUMBER: US/ 09/ 837 , 992 

; CURRENT FILING DATE: 2001-04-18 

; PRIOR APPLICATION NUMBER: US 60/198,465 

; PRIOR FILING DATE: 2000-04-18 

; PRIOR APPLICATION NUMBER: US 60/204,234 

; PRIOR FILING DATE: 2000-05-15 

; NUMBER OF SEQ ID NOS : 45 

; SOFTWARE: PatentIn Ver. 2.1 

; SEQ ID NO 2 

LENGTH: 2258 
; TYPE: DNA 
; ORGANISM: Mus musculus 
; FEATURE: 

OTHER INFORMATION: mouse sitosterolemia susceptibility gene (SSG) 
; NAME/ KEY: CDS 



LOCATION: (47).. (2005) 

OTHER INFORMATION: mouse sitosterolemia susceptibility gene (SSG) 
OTHER INFORMATION: protein 
US-09-837-992-2 

Query Match 7.2%; Score 193.4; DB 9; Length 2258; 

Best Local Similarity 53.4%; Pred. No. 3.5e-48; 

Matches 42 9; Conservative 0; Mismatches 371; Indels 3; Gaps 1; 

Qy 335 GCTGCCAGAATTCTTGTGAGCTGGGCATCCAGAACCTAAGCTTCAAAGTGAGAAGTGGGC 394 

II I I II I I II I I I I I I III I I I I I I I I 

Db 228 GCCAGCAGAAGTGGGACAGGCAAATCCTCAAAGATGTCTCCTTGTACATCGAGAGTGGCC '2 87 

Qy 395 AGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTGATCA 454 

I I II II II II II I I I I I I I I I I I I I I I III II II I I I III 
Db 28 8 AGATTATGTGCATCTTAGGCAGCTCAGGCTCAGGGAAGACCACGCTGCTGGACGCCATCT 347 

Qy 455 CTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAGCCCA 514 

III I I II III I III I I I I I I II II 

Db 34 8 CCGGGAGGCTGCGGCGCACTGGGACCCTGGAAGGGGAGGTGTTTGTGAATGGCTGCGAGC 407 

Qy 515 GCTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCAC7VACCAGCTGCTCC 574 

I I I I III I I I I I I I I III III III 

Db 408 TGCGCAGGGACCAGTTCCAAGACTGCTTCTCCTACGTCCTGCAGAGCGACGTTTTTCTGA 4 67 

Qy 575 CCTU^CTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCT 634 

II I I I I II I I II II I I I I I I III II I I I II I III I 
Db 468 GCAGCCTCACTGTGCGCGAGACGTTGCGATACACAGC GATGCTGGCCCTCTGCCGCA 524 

Qy 635 TCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGC 694 

I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I II I 

Db 525 GCTCCGCGGACTTCTACAACAAGAAGGTAGAGGCAGTCATGACAGAGCTGAGCCTGAGCC 584 

Qy 695 AGTGCGCTGACACCCGCGTGGGC7\ACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCA 754 

I I I I II I I I I I I I I I I I I III I II I I I II 

Db 585 ACGTGGCGGACCAAATGATTGGCAGCTATAATTTTGGGGGAATTTCCAGTGGCGAGCGGC 644 

Qy 755 GGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAAC 814 

I I I I I I I I I I II II I I I I I I I I M I I I I I I I I I I 

Db 645 GCCGAGTTTCCATCGCAGCCCTVACTCCTTCAGGACCCCAAGGTCATGATGCTAGATGAGC 7 04 

Qy 815 CCACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGG 874 

I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I II 

Db 705 CAACCACAGGACTGGACTGCATGACTGCAAATCAAATTGTCCTTCTCTTGGCTGAGCTGG 7 64 

Qy 875 CCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGC 934 

I I I I II I I I I I I I I I I M I I II I I I I II I I I I I I I I I I I I 

Db 765 CTCGCAGGGACCGAATTGTGATTGTCACCATCCACCAGCCTCGCTCTGAGCTCTTCCAAC 824 

Qy 935 TGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGC 994 

I I I I I I I I I I I I I I I I I I I I I I I I 

Db 825 ACTTCGACAAAATTGCCATCCTGACTTACGGAGAGTTGGTGTTCTGTGGCACCCCAGAGG 8 84 

Qy 995 ACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTG 1054 

I I I I I I I I I I I I I I I I I I I I I M I I I II I I I I II 

Db 885 AGATGCTTGGCTTCTTCT^TAACTGTGGTTACCCCTGTCCTGAACATTCCAATCCCTTTG 944 



Qy 1055 ACTTCTATGTGGACCTGACCAGCATTGACAGGCGCAGCAGAGAGCAGGT^TTGGCCACCA 1114 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 945 ATTTTTACATGGACTTGACATCAGTGGACACCCAAAGCAGAGAGCGGGAAATAGAAACGT 1004 

Qy 1115 GGGAGAAGGCTCAGTCACTCGCA 1137 

II I III I I I I 
Db 1005 ACAAGCGAGTACAGATGCTGGAA 1027 



RESULT 8 

US-10-425-114-32175 

Sequence 32175, Application US/10425114 
Publication No. US20040034888A1 
GENERAL INFORMATION: 
APPLICANT: Liu, Jingdong 
APPLICANT: Zhou, Yihua 
APPLICANT: Kovalic, David K. 
APPLICANT: Screen, Steven E 
APPLICANT: Tabaska, Jack E 
APPLICANT: Cao, Yongwei 

TITLE OF INVENTION: Nucleic Acid Molecules and Other Molecules Associated 
With 

TITLE OF INVENTION: Plants and Uses Thereof for Plant Improvement 
FILE REFERENCE: 38-21 ( 53313 ) B 
CURRENT APPLICATION NUMBER: US/10/425,114 
CURRENT FILING DATE: 2003-04-28 
NUMBER OF SEQ ID NOS : 73128 
SEQ ID NO 32175 
LENGTH: 2585 
TYPE: DNA 
ORGANISM: Zea mays 
FEATURE : 

OTHER INFORMATION: Clone ID: UC-ZMFLB73274A02_FLI 
US-10-425-114-32175 

Query Match 6.6%; Score 176.8; DB 12; Length 2585; 

Best Local Similarity 55.3%; Pred, No. 4.9e-43; 

Matches 368; Conservative 0; Mismatches 292; Indels 6; Gaps 1; 

Qy 391 GGGCAGATGCTGGCCATCATAGGGAGCTCAGGTTGTGGGAGAGCCTCCTTGCTAGATGTG 450 

III III II I I II I II I I I I I I III I I I I I I I I I I I 
Db 605 GGGTCGCTGACCGCGCTCATGGGGCCCTCGGGGTCCGGCAAGTCCACCCTGCTCGACGCC 664 

Qy 451 ATCACTGGCCGAGGTCACGGCGGCAAGATCAAGTCAGGCCAGATCTGGATCAATGGGCAG 510 

I I I I I II I III I II I I I I I I I I I M M 
Db 665 CTCGCCGGCCGCCTCGCCGCCAACGCCTTCCTCTCCGGCAACGTGCTCCTCAACGG 72 0 

Qy 511 CCCAGCTCGCCTCAGCTGGTGAGGAAGTGTGTGGCCCACGTGCGCCAGCACAACCAGCTG 57 0 

III I I I I I I I I I I I I I M I I I I I I I I I I I 

Db 721 — CCGCAAGGCCAAGCTCTCCTTCGGCGCCGCGGCGTACGTGACGCAGGACGACAACCTG 77 8 

Qy 571 CTCCCCAACTTGACTGTGCGAGAGACCTTGGCCTTCATTGCCCAGATGCGGCTGCCCAGA 630 

M I I II I I I I M I I I I I I I I I I M I I I I I I I I I I I I 
Db 779 ATCGGGACGCTGACGGTGCGCGAGACGATCGGCTACTCGGCGCTGCTGCGGCTGCCGGAC 838 

Qy 631 ACCTTCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGACGTGATCGCGGAGCTGCGGCTT 690 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 



Db 839 AAGATGCCGCGGGAGGACAAGCGCGCGCTGGTGGAGGGCACCATCGTCGAGATGGGGCTG 898 

Qy 691 AGGCAGTGCGCTGACACCCGCGTGGGCAACATGTACGTGCGGGGGTTGTCGGGGGGTGAG 750 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 899 CAGGACTGCGCCGACACCGTCATCGGCAACTGGCACCTCCGCGGGGTCAGCGGCGGCGAG 958 

Qy 751 CGCAGGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGAACCCAGGTyVTCCTTATTCTCGAC 810 

I I I I I I I I I I I I I I I I I I I I III I II II I I I I I I I 
Db 959 AAGCGCCGCGTCAGCATCGCGCTCGAGCTACTCATGCGCCCGCGCCTCCTCTTCCTCGAC 1018 

Qy 811 G7\ACCCACCTCTGGGCTCGACAGCTTCACAGCCCACAACCTGGTGAAGACCTTGTCCAGG 87 0 

I I I I I I I I I I I I I I II II I I III I III I II II II I 
Db 1019 GAGCCCACCAGCGGCCTCGACAGCTCGTCTGCGTTCTTCGTGACGCAGACGCTGCGGGGC 1078 

Qy 871 CTGGCCAAAGGCAACCGGCTGGTGCTCATCTCCCTCCACCAGCCTCGCTCTGACATCTTC 930 

I I I I I I II III I I I I I I I I I I I I I I I I I I II II I I I I 

Db 107 9 CTGGCGAGGGACGGCAGGACGGTGATTGCTTCCATCCACCAGCCCAGCAGCGAGGTGTTC 1138 

Qy 931 AGGCTGTTTGATCTGGTCCTCCTGATGACGTCTGGCACCCCCATCTACTTAGGGGCGGCC 990 

I I I II I I I I M I I I I I I I I I I I I I I I I I I I I I I II 

Db 1139 GAGCTCTTCGACATGCTCTTCCTGCTATCCGGGGGCAAGACCGTCTACTTCGGACAAGCA 1198 

Qy 991 CAGCACATGGTCCAGTATTTCACAGCCATCGGCTACCCCTGTCCTCGCTACAGCAATCCT 1050 

III I I I I M I I I I I I M I I I I I I I I I I 

Db 1199 TCGCAAGCATGCGAGTTCTTTGCTCAAGCCGGTTTCCCTTGCCCGGCTCTGCGGAATCCG 1258 

Qy 1051 GCTGAC 1056 

I I I I 

Db 1259 TCCGAC 1264 



RESULT 9 

US-09-866-866A-13 

; Sequence 13, Application US/09866866A 

; Patent No. US20020102244A1 

; GENERAL INFORMATION: 

; APPLICANT: Sorrentino, Brian 

; APPLICANT: Schuetz, John 

; TITLE OF INVENTION: A Method of Identifying and/or Isolating Stem Cells 

; FILE REFERENCE: 134 0-1-021CIP2 

; CURRENT APPLICATION NUMBER: US/09/866, 866A 

; CURRENT FILING DATE: 2001-08-30 

; PRIOR APPLICATION NUMBER: 09/584,586 

PRIOR FILING DATE: 2000-05-31 
; PRIOR APPLICATION NUMBER: PCT/US99/ 11825 
; PRIOR FILING DATE: 1999-05-27 
; PRIOR APPLICATION NUMBER: 60/086,988 
; PRIOR FILING DATE: 1998-05-28 
; NUMBER OF SEQ ID NOS : 27 

SOFTWARE: Patentin version 3.0 
; SEQ ID NO 13 

LENGTH: 2025 
TYPE: DNA 

ORGANISM: Mus mus cuius 
US-09-866-866A-13 

Query Match 4.6%; Score 122; DB 9; Length 2025; 



Best Local Similarity 51.7%; Pred. No. 3.1e-26; 

Matches 27 8; Conservative 0; Mismatches 260; Indels 0; Gaps 0; 

Qy 538 TGTGTGGCCCACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACC 597 

III I I I I I II I II III II I I I I I II I I I I I I I 

Db 374 TGTTCAGGTTATGTGGTTCAAGATGACGTTGTGATGGGCACCCTGACAGTGAGAGAAAAC 433 

Qy 598 TTGGCCTTCATTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAA 657 

II III II I II I I II I I II I I I III 

Db 434 TTACAGTTCTCAGCAGCTCTTCGACTTCCAACAACTATGAAGAATCATGAAAAAAATGAA 493 

Qy 658 AGGGTGGAGGACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGC 717 

II I I III II I I I I I I I I I I I I I I I 

Db 4 94 CGGATTAACACAATCATTAAAGAGTTAGGTCTGGAAAAAGTAGCAGATTCTAAGGTCGGA 553 

Qy 718 AACATGTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAG 777 

I II I I I II I I I I I M I I II II I I I I I I I I II I I 

Db 554 ACTCAGTTTATCCGTGGCATCTCTGGAGGAGAAAGAAAAAGGACAAGCATAGGGATGGAG 613 

Qy 778 CTCCTGTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTC 837 

II I I I II I I I I I I I I I I I I I I I I I I I II I I II I 

Db 614 CTGATCACTGACCCTTCCATCCTCTTCCTGGATGAGCCCACGACTGGTTTGGACTCAAGC 673 

Qy 838 ACAGCCCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTC 897 

I I I I I I II I I I I I I I I I I I I I II III 

Db 674 ACAGCGAATGCTGTCCTTTTGCTCCTGAAAAGGATGTCT7WVCAGGGTCGAACAATCATC 733 

Qy 898 ATCTCCCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATG 957 

I I I II I II I I I I I I I I I I I I I I II I I I I I I I II II II II 

Db 734 TTCTCCATTCATCAGCCTCGGTATTCCATCTTTAAGTTGTTTGACAGCCTCACCTTACTG 793 

Qy 958 ACGTCTGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCC 1017 

I I I I I I I I I I I I I I I I I II I I I I I II I I III 

Db 7 94 GCTTCCGGG7\AACTCGTGTTCCATGGGCCAGCACAGAAGGCCTTGGAGTACTTTGCATCA 853 

Qy 1018 ATCGGCTACCCCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACCA 1075 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I 
Db 854 GCAGGTTACCACTGTGAGCCCTACAACAACCCTGCGGATTTTTTCCTTGATGTCATCA 911 



RESULT 10 

US-10-424-599-129897 

; Sequence 129897, Application US/10424599 

; Publication No. US20040031072A1 

; GENERAL INFORMATION: 

; APPLICANT: La Rosa Thomas J 

; APPLICT^T: Kovalic David K 

; APPLICANT: Zhou Yihua 

; APPLICANT: Cao Yongwei 

; TITLE OF INVENTION: Soy Nucleic Acid Molecules and Other Molecules Associated 
With 

; TITLE OF INVENTION: Plants and Uses Thereof for Plant Improvement 

; FILE REFERENCE: 38-21 ( 53223 ) B 

; CURRENT APPLICATION NUMBER: US/ 10/ 424 , 599 

; CURRENT FILING DATE: 2003-04-28 

; NUMBER OF SEQ ID NOS : 285684 

; SEQ ID NO 129897 



LENGTH: 972 
; TYPE: DNA 
; ORGANISM: Glycine max 
; FEATURE : 

OTHER INFORMATION: Clone ID: PAT_MRT3847_88305C . 1 
US-10-424-599-129897 



Query Match 4.3%; Score 115.6; DB 12; Length 972; 

Best Local Similarity 53.7%; Fred. No. 2e-24; 

Matches 288; Conservative 0; Mismatches 239; Indels 9; Gaps 2; 

Qy 54 9 CGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTGGCCTTCAT 608 

I I I I I I I I I I I I III II I I I I II I II I I I I I I I I I 

Db 401 CGTTCCCCAAGACGACGTTCACTACCCTCACCTCACAGTGTTAGAGACTTTAACCTACGC 460 



Qy 609 TGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGGA 668 

II II I I I I I I I I I I I II II I I II I 

Db 4 61 AGCGTTATTGAGACTTCCGAAGAGTTTGAGCAGAGAAGAGAAGAAGGAGCACGCGGAGAT 520 



Qy 669 CGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGG GCAACAT 722 

I I M I I I I I I I I I I I I I I I I II I I I I I I M I I III 
Db 521 GGTGATTGCGGAGCTAGGGCTAACACGGTGTCGTAACAGCCCCGTTGGAGGGTGCATGGC 580 



Qy 723 GTACGTGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTCCT 7 82 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
Db 581 TCTGTTCCGTGGCATTTCGGGTGGGGAACGGAAACGGGTCAGTATCGGGCAGGAGATGTT 640 



Qy 783 GTGGAACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACAGC 842 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
Db 641 GGTCAACCCGAGTTTGTTGTTTGTTGATGAGCCCACCTCGGGCTTGGACTCCACCACGGC 700 



Qy 843 CCACAACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATCTC 902 

III I II I I I I I II I I I I I II II I II I I 

Db 7 01 CCAACTTATTGTGTCGGTGCTCCGCGGGCTCGCCTTGGCGGGTCGAACCGTCGTCACCAC 760 



Qy 903 CCTCCACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACGTC 962 

I I I I I I II I I I I II I I I II I I I I I I III I I I I I I I 

Db 7 61 CATCCACCAGCCCTCCAGCCGGTTGTATAGGATGTTTGATAAGGTGGTCGTGTTGTCAGA 820 



Qy 963 TGGCACCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATCGG 1022 

IN I I I I I I I III III I I I I I I I I I I I I I I I I 

Db 821 TGGGTACCCAATTTATAGCGGGCAGGCGGGTCGGGTCATGGACTATCTCGGATCCGTCGG 880 



Qy 1023 CTA CCCCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGACCA 1075 

II III I I I I I I I I I I I I I I M I I M I I I II 

Db 881 ATATGTCCCAGCTTTCAACTTCATGAACCCAGCAGATTTCCTTCTTGACCTTGCTA 936 



RESULT 11 
US-10-405-806-1 

; Sequence 1, Application US/10405806 

; Publication No. US20030232362A1 

; GENER7VL INFORMATION: 

; APPLIC7\NT: KOMATANI , HIDEYA 

; APPLICANT: HARA, YOSHIKAZU 

; APPLICANT: KOTANI, HIDEHITO 

; APPLICANT: NAKAGAWA, RINAKO 



TITLE OF INVENTION: DRUG RESISTANT GENE AND USE THEREOF 
FILE REFERENCE: 234985US0CONT 
CURRENT APPLICATION NUMBER: US/ 10/405 , 8 06 
CURRENT FILING DATE: 2003-04-03 
PRIOR APPLICATION NUMBER: PCT/ JPOl/08 112 
PRIOR FILING DATE: 2001-09-18 
PRIOR APPLICATION NUMBER: JP2000-30344 1 
PRIOR FILING DATE: 2000-10-03 
NUMBER OF SEQ ID NOS : 17 
SOFTWARE: Patentin version 3.2 
SEQ ID NO 1 
LENGTH: 2 027 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE: 
NAME/ KEY: CDS 
LOCATION: (45).. (2009) 
US-10-405-806-1 

Query Match 4.3%; Score 115.4; DB 15; Length 2027; 

Best Local Similarity 51.2%; Pred. No. 3.3e-24; 

Matches 269; Conservative 0; Mismatches 256; Indels 0; Gaps 0; 

Qy 548 ACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTGGCCTTCA 607 

I I I I I II I I Ml II I II I I II I I II I II I III 

Db 412 ACGTGGTACAAGATGATGTTGTGATGGGCACTCTGACGGTGAGAGAAAACTTACAGTTCT 471 

Qy 608 TTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACT^AAAGGGTGGAGG 667 

II I I I I I I I I I I I I I I I I I I I I I I I 

Db 472 CAGCAGCTCTTCGGCTTGCAACAACTATGACGAATCATGAAAAAAACGAACGGATTAACA 531 

Qy 668 ACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAACATGTACG 727 

I I I I I I I I III I I I II I I I Mill II 

Db 532 GGGTCATTCAAGAGTTAGGTCTGGATAAAGTGGCAGACTCCAAGGTTGGAACTCAGTTTA 591 

Qy 728 TGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGA 787 

Mill I II I I I M II II II I I I I I I II I I II I 

Db 592 TCCGTGGTGTGTCTGGAGGAGAAAGAAAAAGGACTAGTATAGGAATGGAGCTTATCACTG 651 

Qy 788 ACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACAGCCCACA 847 

III MM I M II I I II I I I II II II II I II I I 

Db 652 ATCCTTCCATCTTGTTCTTGGATGAGCCTACAACTGGCTTAGACTCAAGCACAGCAAATG 711 

Qy 848 ACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATCTCCCTCC 907 

I I I Ml I II II M I M I I M I II M I 

Db 712 CTGTCCTTTTGCTCCTGAAAAGGATGTCTAAGCAGGGACGAACAATCATCTTCTCCATTC 771 

Qy 908 ACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACGTCTGGCA 967 

II I I M I II II II M II I II II I II II I II II I I II II II 

Db 772 ATCAGCCTCGATATTCCATCTTCAAGTTGTTTGATAGCCTCACCTTATTGGCCTCAGGAA 831 

Qy 968 CCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATCGGCTACC 1027 

I II II I I II II I II I I I II I M II I I I 

Db 832 GACTTATGTTCCACGGGCCTGCTCAGGAGGCCTTGGGATACTTTGAATCAGCTGGTTATC 891 

Qy 1028 CCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGA 1072 

I II I MM II II I I I I M I II I II I M II 



Db 



892 ACTGTGAGGCCTATT^TAACCCTGCAGACTTCTTCTTGGACATCA 936 



RESULT 12 
US-10-405-806-12 

; Sequence 12, Application US/10405806 

; Publication No. US20030232362A1 

; GENERAL INFORMATION: 

; APPLICANT: KOMATANI, HIDEYA 

; APPLICANT: HARA, YOSHIKAZU 

; APPLIC7\NT: KOTANI, HIDEHITO 

; APPLICANT: NAKAGAWA, RINAKO 

; TITLE OF INVENTION: DRUG RESISTANT GENE AND USE THEREOF 

; FILE REFERENCE: 234 985US0CONT 

; CURRENT APPLICATION NUMBER: US/ 1 0/405 , 806 

; CURRENT FILING DATE: 2003-04-03 

; PRIOR APPLICATION NUMBER: PCT/ JPOl/08 112 

; PRIOR FILING DATE: 2001-09-18 

; PRIOR APPLICATION NUMBER: JP2 000-30344 1 

; PRIOR FILING DATE: 2000-10-03 

; NUMBER OF SEQ ID NOS: 17 

SOFTWARE: PatentIn version 3.2 
; SEQ ID NO 12 

LENGTH: 2053 

TYPE: DNA 

ORGANISM: Artificial Sequence 
FEATURE : 

; OTHER INFORMATION: ABCG2 482Tmutant sequence 

; FEATURE : 

; NAME/KEY: CDS 

; LOCATION: (32).. (1999) 

US-10-405-806-12 

Query Match 4.3%; Score 115.4; DB 15; Length 2053; 

Best Local Similarity 51.2%; Pred. No. 3.4e-24; 

Matches 269; Conservative 0; Mismatches 256; Indels 0; Gaps 0; 

Qy 548 ACGTGCGCCAGCACAACCAGCTGCTCCCC7\ACTTGACTGTGCGAGAGACCTTGGCCTTCA 607 

I I I I I II I I III II I II I I II II II I II I III 

Db 399 ACGTGGTACAAGATGATGTTGTGATGGGCACTCTGACGGTGAGAGAAAACTTACAGTTCT 458 

Qy 608 TTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGG 667 



Db 



459 




518 



Qy 



668 



ACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAACATGTACG 



727 



Db 



519 




578 



Qy 



728 



TGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGA 



787 



Db 



579 




638 



Qy 



788 



ACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACAGCCCACA 



847 



Db 



639 




698 



Qy 848 ACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATCTCCCTCC 907 

I I I III I M II I I I II I I I I I I I I I I 

Db 699 CTGTCCTTTTGCTCCTGAAAAGGATGTCTAAGCAGGGACGAACAATCATCTTCTCCATTC 758 

Qy 908 ACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACGTCTGGCA 967 

I I I I I M I I I I II I I I I II I I M I I I II II II I I I I I I I I 

Db 759 ATCAGCCTCGATATTCCATCTTCAAGTTGTTTGATAGCCTCACCTTATTGGCCTCAGGAA 818 

Qy 968 CCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATCGGCTACC 1027 

I Mil I I I I I I I I I I I II I I II I I II I 

Db 819 GACTTATGTTCCACGGGCCTGCTCAGGAGGCCTTGGGATACTTTGAATCAGCTGGTTATC 87 8 

Qy 1028 CCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGA 1072 

1 1 1 1 I II I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n I I I 

Db 879 ACTGTGAGGCCTATT^ATAACCCTGCAGACTTCTTCTTGGACATCA 923 



RESULT 13 
US-09-866-866A-26 

Sequence 26, Application US/09866866A 
Patent No, US20020102244A1 
GENERAL INFORMATION: 
APPLICANT: Sorrentino, Brian 
APPLICANT: Schuetz, John 

TITLE OF INVENTION: A Method of Identifying and/or Isolating Stem Cells 
FILE REFERENCE: 1340-1-021CIP2 
CURRENT APPLICATION NUMBER: US/09/866, 866A 
CURRENT FILING DATE: 2001-08-30 
PRIOR APPLICATION NUMBER: 09/584,586 
PRIOR FILING DATE: 2000-05-31 
PRIOR APPLICATION NUMBER: PCT/US99/11825 
PRIOR FILING DATE: 1999-05-27 
PRIOR APPLICATION NUMBER: 60/086,988 
PRIOR FILING DATE: 1998-05-28 
NUMBER OF SEQ ID NOS : 27 
SOFTWARE: Patentin version 3.0 
SEQ ID NO 26 
LENGTH: 2247 
TYPE: DNA 

ORGANISM: Homo sapiens 
US-09-866-866A-26 

Query Match 4.3%; Score 115.4; DB 9; Length 2247; 

Best Local Similarity 51.2%; Pred. No. 3.5e-24; 

Matches 269; Conservative 0; Mismatches 256; Indels 0; Gaps 0; 

Qy 548 ACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTGGCCTTCA 607 

I I I I I II I I III II I I I I I I I I I I I I I I I III 

Db 561 ACGTGGTACAAGATGATGTTGTGATGGGCACTCTGACGGTGAGAGAAAACTTACAGTTCT 620 

Qy 608 TTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGG 667 

II I I I I I I I I I I I I I I I I I I I I I I I 

Db 621 CAGCAGCTCTTCGGCTTGCAACAACTATGACGAATCATGAAAAAAACGAACGGATTAACA 680 

Qy 668 ACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAACATGTACG 727 

I I I I I I I I III I I I I I I I I Mill II 

Db 681 GGGTCATTCAAGAGTTAGGTCTGGATAAAGTGGCAGACTCCAAGGTTGGAACTCAGTTTA 74 0 



Qy 728 TGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGA 787 

I I I I I I I I I I I I I I I II II I I II II I I I I I I I 

Db 741 TCCGTGGTGTGTCTGGAGGAGAAAGAAAAAGGACTAGTATAGGAATGGAGCTTATCACTG 800 

Qy 788 ACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACAGCCCACA 847 

III II I I I I I I I I I I I I I I I I Mil I I I I I I I 

Db 801 ATCCTTCCATCTTGTTCTTGGATGAGCCTACAACTGGCTTAGACTCAAGCACAGC7WVTG 8 60 

Qy 848 ACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATCTCCCTCC 907 

I I I III I I I I I I I I II I I I I I I I I I I 

Db 861 CTGTCCTTTTGCTCCTGAAAAGGATGTCTAAGCAGGGACGAACAATCATCTTCTCCATTC 920 

Qy 908 ACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACGTCTGGCA 967 

I I I I I II I I I I I I I I I I I I I I I M I I I I M II II I I I I I I 

Db 921 ATCAGCCTCGATATTCCATCTTCAAGTTGTTTGATAGCCTCACCTTATTGGCCTCAGGAA 98 0 

Qy 968 CCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATCGGCTACC 1027 

I I I I I I I I I II II I I I I I I I 11 I I I I I 

Db 981 GACTTATGTTCCACGGGCCTGCTCAGGAGGCCTTGGGATACTTTGAATCAGCTGGTTATC 104 0 

Qy 1028 CCTGTCCTCGCTACAGC7\ATCCTGCTGACTTCTATGTGGACCTGA 1072 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1041 ACTGTGAGGCCTATAATAACCCTGCAGACTTCTTCTTGGACATCA 1085 



RESULT 14 
US-09-961-086-2 

; Sequence 2, Application US/09961086 
; Publication No. US20030036645A1 
; GENERAL INFORMATION: 

; APPLICANT: UNIVERSITY OF MARYLAND, BALTIMORE 
; APPLICANT: ROSS, Douglas D. 
; APPLICANT: DOYLE, L. Austin 
; APPLICANT: ABRUZZO, Lynne 

; TITLE OF INVENTION: BREAST CANCER RESISTANCE PROTEIN (BCRP) AND THE DNA 

; TITLE OF INVENTION; WHICH ENCODES IT 

; FILE REFERENCE: EP19376-019 

; CURRENT APPLICATION NUMBER: US/09/961, 086 

; CURRENT FILING DATE: 2001-09-21 

; PRIOR APPLICATION NUMBER: US 60/073,763 

; PRIOR FILING DATE: 1998-02-05 

; PRIOR APPLICATION NUMBER: PCT/US99/02577 

; PRIOR FILING DATE: 1999-02-05 

; NUMBER OF SEQ ID NOS : 7 

; S0FTW7VRE: Patent In Ver. 2.1 

; SEQ ID NO 2. 

LENGTH: 2418 

TYPE: DNA 

ORGANISM: Homo sapiens 
US-09-961-086-2 

Query Match 4.3%; Score 115.4; DB 10; Length 2418; 

Best Local Similarity 51.2%; Pred. No. 3.7e-24; 

Matches 269; Conservative 0; Mismatches 256; Indels 0; Gaps 0; 



Qy 



54 8 ACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTGGCCTTCA 607 



Db 606 ACGTGGTACAAGATGATGTTGTGATGGGCACTCTGACGGTGAGAGAAAACTTACAGTTCT 665 

Qy 608 TTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGACAAAAGGGTGGAGG 667 

II I I I I I I I I I I I I I I I I I M I I I I 

Db 666 CAGCAGCTCTTCGGCTTGCAACAACTATGACGAATCATGAAAAAAACGAACGGATTAACA 725 

Qy 668 ACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAACATGTACG 727 

II I I II I I III I I I I I I I I I I I I I II 

Db 726 GGGTCATTCAAGAGTTAGGTCTGGATAAAGTGGCAGACTCCAAGGTTGGAACTCAGTTTA 785 

Qy 728 TGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGA 7 87 

I I I I I I I II I I II II II II I I I I M MINI I 

Db 786 TCCGTGGTGTGTCTGGAGGAGAAAGAAAAAGGACTAGTATAGGAATGGAGCTTATCACTG 845 

Qy 7 88 ACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACAGCCCACA 847 

III I II I I I I I I I II I I I I I I MM I II I II I 

Db 846 ATCCTTCCATCTTGTTCTTGGATGAGCCTACAACTGGCTTAGACTCAAGCACAGCAAATG 905 

Qy 848 ACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCAACCGGCTGGTGCTCATCTCCCTCC 907 

I I I III I I I I I I I I II I I I I I II I I I 

Db 906 CTGTCCTTTTGCTCCTGAAAAGGATGTCT7\AGCAGGGACGAACAATCATCTTCTCCATTC 965 

Qy 908 ACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACGTCTGGCA 967 

I I I I I I I I I I I I I I I I I II I I I I I I I I I M II I I I II I I I 

Db 966 ATCAGCCTCGATATTCCATCTTCAAGTTGTTTGATAGCCTCACCTTATTGGCCTCAGG7U\. 1025 

Qy 968 CCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATCGGCTACC 1027 

I I I I I I I I I I I I I I I I II I I II I I I I I 

Db 1026 GACTTATGTTCCACGGGCCTGCTCAGGAGGCCTTGGGATACTTTGAATCAGCTGGTTATC 1085 

Qy 1028 CCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGA 1072 

I I I I MM II II I M II I II I I I II I II I 

Db 1086 ACTGTGAGGCCTATAATAACCCTGCAGACTTCTTCTTGGACATCA 1130 



RESULT 15 
US-09-981-353-34 

; Sequence 34, Application US/09981353 

; Patent No. US20020160382A1 

; GENER7VL INFORMATION: 

; APPLICANT: Lasek, Amy W. 

; APPLICANT: Jones, David A. 

; TITLE OF INVENTION: GENES EXPRESSED IN COLON CANCER 
; FILE REFERENCE: PA-0038 US 

; CURRENT APPLICATION NUMBER: US/ 09/ 981 , 353 
; CURRENT FILING DATE: 2001-10-11 
; NUMBER OF SEQ ID NOS : 194 

SOFTWARE: PERL Program 
; SEQ ID NO 34 

LENGTH: 2574 

TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

NAME/KEY: misc_feature 

OTHER INFORMATION: Incyte ID No. US2 0020160382A1 5517972CB1 
US-09-981-353-34 



Query Match 4.3%; Score 115,4; DB 9; Length 2574; 

Best Local Similarity 51.2%; Pred. No. 3.8e-24; 

Matches 269; Conservative 0; Mismatches 256; Indels 0; Gaps 0; 

Qy 548 ACGTGCGCCAGCACAACCAGCTGCTCCCCAACTTGACTGTGCGAGAGACCTTGGCCTTCA 607 

I I I I I II I I III II II I I I I I I II I I I II III 

Db 776 ACGTGGTACAAGATGATGTTGTGATGGGCACTCTGACGGTGAGAGAAAACTTACAGTTCT 835 

Qy 608 TTGCCCAGATGCGGCTGCCCAGAACCTTCTCCCAGGCCCAGCGTGAC7W\AGGGTGGAGG 667 

II I I I I I I I I I I I I I I I I I I I I I I I 

Db 836 CAGCAGCTCTTCGGCTTGCAACAACTATGACGAATCATGAAAAAAACGT^CGGATTAACA 895 

Qy 668 ACGTGATCGCGGAGCTGCGGCTTAGGCAGTGCGCTGACACCCGCGTGGGCAACATGTACG 727 

I I I I I I I I III I I I I I I I I I I I I I II 

Db 8 96 GGGTCATTCAAGAGTTAGGTCTGGATAAAGTGGCAGACTCCAAGGTTGGAACTCAGTTTA 955 

Qy 728 TGCGGGGGTTGTCGGGGGGTGAGCGCAGGAGAGTCAGCATTGGGGTGCAGCTCCTGTGGA 787 

I I I II I I I I I I I I I I II II I I I I II I I I I I I I 

Db 956 TCCGTGGTGTGTCTGGAGGAGAAAGAAAAAGGACTAGTATAGGAATGGAGCTTATCACTG 1015 

Qy 788 ACCCAGGAATCCTTATTCTCGACGAACCCACCTCTGGGCTCGACAGCTTCACAGCCCACA 847 

III I I I I I I M I I M I I II I I I I II I I I I M I 

Db 1016 ATCCTTCCATCTTGTTCTTGGATGAGCCTACAACTGGCTTAGACTCAAGCACAGCAAATG 1075 

Qy 848 ACCTGGTGAAGACCTTGTCCAGGCTGGCCAAAGGCTyVCCGGCTGGTGCTCATCTCCCTCC 907 

I I I III 1 1 1 1 1 I 1 1 II I 1 1 1 1 1 1 1 I r 

Db 1076 CTGTCCTTTTGCTCCTGAAAAGGATGTCTAAGCAGGGACGAACT^TCATCTTCTCCATTC 1135 

Qy 908 ACCAGCCTCGCTCTGACATCTTCAGGCTGTTTGATCTGGTCCTCCTGATGACGTCTGGCA 967 

I I I I I I I I I I I I I I M I I I I I I I I I I I I II II I I I I I II I 

Db 1136 ATCAGCCTCGATATTCCATCTTCAAGTTGTTTGATAGCCTCACCTTATTGGCCTCAGGAA 1195 

Qy 968 CCCCCATCTACTTAGGGGCGGCCCAGCACATGGTCCAGTATTTCACAGCCATCGGCTACC 1027 

I I I I I I I I I I I I I I I I MM II I I II I 

Db 1196 GACTTATGTTCCACGGGCCTGCTCAGGAGGCCTTGGGATACTTTGAATCAGCTGGTTATC 1255 

Qy 1028 CCTGTCCTCGCTACAGCAATCCTGCTGACTTCTATGTGGACCTGA 1072 

I I I I II I I M I II II M I I I II I I II I I I 

Db 1256 ACTGTGAGGCCTATAATAACCCTGCAGACTTCTTCTTGGACATCA 1300 



Search completed: February 27, 2004, 07:11:42 
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